Python 3 Script to Download PDF Files From URL Using BeautifulSoup4 and Requests Library Full Tutorial For Beginners

Python 3 Script to Download PDF Files From URL Using BeautifulSoup4 and Requests Library Full Tutorial For Beginners

 

Welcome folks today in this blog post we will be downloading pdf files from url using beautifulsoup4 and requests library in python. All the full source code of the application is given below.

 

 

 

Get Started

 

 

 

In order to get started you need to install the following libraries

 

pip install requests

 

pip install bs4

 

After installing all these libraries inside your python project just make an app.py file and copy paste the following code

 

app.py

 

# Import libraries 
import requests 
from bs4 import BeautifulSoup 

# URL from which pdfs to be downloaded 
url = "https://nanonets.com/blog/deep-learning-ocr/"

# Requests URL and get response object 
response = requests.get(url) 

# Parse text obtained 
soup = BeautifulSoup(response.text, 'html.parser') 

# Find all hyperlinks present on webpage 
links = soup.find_all('a') 

i = 0

# From all links check for pdf link and 
# if present download file 
for link in links: 
    if ('.pdf' in link.get('href', [])): 
        i += 1
        print("Downloading file: ", i) 

        # Get response object for link 
        response = requests.get(link.get('href')) 

        # Write content in pdf file 
        pdf = open("pdf"+str(i)+".pdf", 'wb') 
        pdf.write(response.content) 
        pdf.close() 
        print("File ", i, " downloaded") 

print("All PDF files downloaded")

 

See also  Python 3 pdfrw Library Tutorial to Edit or Alter Title of PDF Document Full Project For Beginners

 

Now inside this python script we provided the url from which we will download the pdf files so now you will execute this python script by running the below command

 

python app.py

 

 

 

 

 

Now you can see that after executing the python script it has downloaded all the three pdf files from the url and stored it inside the root directory

 

Leave a Reply