Python 3 Web Scraping Selenium Script to Bulk Download all Images From URL in Command Line Using BeautifulSoup4 Library Full Project For Beginners
Welcome folks today in this post we will be making a web scraper
which will automatically download all the images
that are present inside the website url
using selenium and beautifulsoup4 library in python. All the source code of the application is shown below
Get Started
In order to get started you need to search for your chrome
browser or any company browser you are using you need to download the chrome
driver for your browser for chrome users from here
After downloading this store the driver inside the root directory wherever you are developing the python project. Now install the following libraries using the pip
command as shown below
pip install requests
pip install beautifulsoup4
pip install selenium
After installing all these libraries make an app.py
file and copy paste the following code to it as shown below
app.py
from selenium import webdriver
import requests as rq
import os
from bs4 import BeautifulSoup
import time
# path= E:\web scraping\chromedriver_win32\chromedriver.exe
path = input("Enter Path : ")
url = input("Enter URL : ")
output = "output"
def get_url(path, url):
driver = webdriver.Chrome(executable_path=r"{}".format(path))
driver.get(url)
print("loading.....")
res = driver.execute_script("return document.documentElement.outerHTML")
return res
def get_img_links(res):
soup = BeautifulSoup(res, "lxml")
imglinks = soup.find_all("img", src=True)
return imglinks
def download_img(img_link, index):
try:
extensions = [".jpeg", ".jpg", ".png", ".gif"]
extension = ".jpg"
for exe in extensions:
if img_link.find(exe) > 0:
extension = exe
break
img_data = rq.get(img_link).content
with open(output + "\\" + str(index + 1) + extension, "wb+") as f:
f.write(img_data)
f.close()
except Exception:
pass
result = get_url(path, url)
time.sleep(60)
img_links = get_img_links(result)
if not os.path.isdir(output):
os.mkdir(output)
for index, img_link in enumerate(img_links):
img_link = img_link["src"]
print("Downloading...")
if img_link:
download_img(img_link, index)
print("Download Complete!!")
Now to run this script you need to type the below command
python app.py
After running the command it will ask for chrome driver
path you just need to see below figure
You can see that after providing the chrome driver path and also the website url with https
this is mandatory for the application to work and then it will automatically open the url with chrome browser and downloads all the images which are present inside it and store it inside the folder called as output
which is shown below