Python 3 PikePDF Library Script to Extract All Links URLs From PDF Document Full Project For Beginners

Python 3 PikePDF Library Script to Extract All Links URLs From PDF Document Full Project For Beginners

 

Welcome folks today in this post we will be extracting all links or urls from pdf document using pikepdf library in python. All the source code of the application will be shown below

 

 

 

Get Started

 

 

 

In order to get started we need to install the following libraries using the pip command as shown below

 

pip install pikepdf

 

After installing this library make an app.py file and copy paste the following code

 

app.py

 

 

import pikepdf # pip3 install pikepdf

file = "##pathofpdffile##.pdf"
# file = "1710.05006.pdf"
pdf_file = pikepdf.Pdf.open(file)
urls = []
# iterate over PDF pages
for page in pdf_file.pages:
    for annots in page.get("/Annots"):
        uri = annots.get("/A").get("/URI")
        if uri is not None:
            print("[+] URL Found:", uri)
            urls.append(uri)

print("[*] Total URLs extracted:", len(urls))

 

 

In the above python snippet of code just replace the path of the input pdf file from where you need to extract all the links or urls

 

See also  Python 3 OpenCV + Numpy + Imutils Library Script to Connect PC to Android Phone and Streaming Live Webcam Video and Save Image Full Project For Beginners

And after that when you run the python script by typing the below command

 

python app.py

 

Here for example we are taking this below pdf file which contains links or urls

 

 

 

 

After execution of this python script it will print all the links which are present inside the pdf file as shown below in the console

 

 

 

Leave a Reply