Python 3 Script to Extract Text From PDF File Using PyPDF2 Library Full Tutorial For Beginners

Python 3 Script to Extract Text From PDF File Using PyPDF2 Library Full Tutorial For Beginners

 

Welcome folks today in this blog post we will be extracting text from pdf file in python 3 using pypdf2 library. All the source code of the project will be given below.

 

 

Requirements

 

 

python 3 should be installed on your system

 

pypdf2 library should be installed on your system

 

 

Installation

 

 

In order to install pypdf2 library we can use the pip command to install

 

pip install pypdf2

 

After installing it you can make a app.py file inside the root directory and copy paste the following code

 

 

app.py

 

 

# importing required modules 
import PyPDF2 

# creating a pdf file object 
pdfFileObj = open('create.pdf', 'rb') 

# creating a pdf reader object 
pdfReader = PyPDF2.PdfFileReader(pdfFileObj) 

# printing number of pages in pdf file 
print(pdfReader.numPages) 

# creating a page object 
pageObj = pdfReader.getPage(0) 

# extracting text from page 
print(pageObj.extractText()) 

# closing the pdf file object 
pdfFileObj.close()

 

See also  Python 3 WxPython Image Viewer or Preview Dialog or Modal From Path GUI Desktop App Full Project For Beginners

 

 

Now if you execute this python script app.py by running this command like below

 

python app.py

 

 

You can see the result like this

 

 

 

 

Leave a Reply