Welcome folks today in this blog post we will be extracting text from pdf file in python 3 using pypdf2 library
. All the source code of the project will be given below.
Requirements
python 3
should be installed on your system
pypdf2
library should be installed on your system
Installation
In order to install pypdf2
library we can use the pip
command to install
pip install pypdf2
After installing it you can make a app.py
file inside the root directory and copy paste the following code
app.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
# importing required modules import PyPDF2 # creating a pdf file object pdfFileObj = open('create.pdf', 'rb') # creating a pdf reader object pdfReader = PyPDF2.PdfFileReader(pdfFileObj) # printing number of pages in pdf file print(pdfReader.numPages) # creating a page object pageObj = pdfReader.getPage(0) # extracting text from page print(pageObj.extractText()) # closing the pdf file object pdfFileObj.close() |
Now if you execute this python script app.py
by running this command like below
python app.py
You can see the result like this