Python 3 (Tabula) Example Script to Extract All Tables From PDF Document and Convert to XLSX and CSV Files Full Project For Beginners

Python 3 (Tabula) Example Script to Extract All Tables From PDF Document and Convert to XLSX and CSV Files Full Project For Beginners

 

Welcome folks today in this post we will be extracting all the tables which are present inside the pdf document using tabula library in python. And we will be storing these tables in excel files and csv files. All the source code of the application is given below.

 

 

Get Started

 

 

 

In order to get started you need to install the following libraries using pip command as shown below

 

pip install tabula

 

And after that you need to create app.py file and copy paste the following code

 

app.py

 

import tabula
import os
# uncomment if you want to pass pdf file from command line arguments
# import sys

# read PDF file
# uncomment if you want to pass pdf file from command line arguments
# tables = tabula.read_pdf(sys.argv[1], pages="all")
tables = tabula.read_pdf("table.pdf", pages="all")

# save them in a folder
folder_name = "tables"
if not os.path.isdir(folder_name):
    os.mkdir(folder_name)
# iterate over extracted tables and export as excel individually
for i, table in enumerate(tables, start=1):
    table.to_excel(os.path.join(folder_name, f"table_{i}.xlsx"), index=False)

# convert all tables of a PDF file into a single CSV file
# supported output_formats are "csv", "json" or "tsv"
tabula.convert_into("table.pdf", "output.csv", output_format="csv", pages="all")
# convert all PDFs in a folder into CSV format
# `pdfs` folder should exist in the current directory
tabula.convert_into_by_batch("pdfs", output_format="csv", pages="all")

 

See also  Python 3 Bulk Domain Age Checker Web Scraping Script Using BeautifulSoup4 Library Full Project For Beginners

 

Now if you execute the python app by typing the below command

 

python app.py

 

It will create the tables directory and in that directory it will contain the excel file and also it will create the csv file as well as shown below

 

 

 

 

 

Leave a Reply