Python 3 (Camelot) Library Example Script to Extract All Tables From PDF Document and Save it as XLSX,CSV and HTML File Full Project For Beginners

Python 3 (Camelot) Library Example Script to Extract All Tables From PDF Document and Save it as XLSX,CSV and HTML File Full Project For Beginners

 

Welcome folks today in this post we will be extracting all the tables from pdf documents using camelot library in python. All the full source code of the application is shown below.

 

 

 

Get Started

 

 

 

In order to get started you need to install the following libraries using the pip command as shown below

 

pip install camelot

 

After installing this make an app.py file and copy paste the following code

 

app.py

 

 

import camelot

# PDF file to extract tables from (from command-line)
file = "table.pdf"

# extract all the tables in the PDF file
tables = camelot.read_pdf(file)

# number of tables extracted
print("Total tables extracted:", tables.n)

# print the first table as Pandas DataFrame
print(tables[0].df)

# export individually as CSV
tables[0].to_csv("foo.csv")
# export individually as Excel (.xlsx extension)
tables[0].to_excel("foo.xlsx")

# or export all in a zip
tables.export("foo.csv", f="csv", compress=True)

# export to HTML
tables.export("foo.html", f="html")

 

Leave a Reply