Python 3 Script to Find Broken Links of Website or Domain URL Using BeautifulSoup4 and Requests Library SEO Tool Full Project For Beginners

Python 3 Script to Find Broken Links of Website or Domain URL Using BeautifulSoup4 and Requests Library SEO Tool Full Project For Beginners

 

Welcome folks today in this blog post we will be finding broken links of website using beautifulsoup4 and requests library in python. All the full source code of the application will be shown below.

 

 

 

Get Started

 

 

 

In order to get started you need to include the following libraries as shown below

 

pip install bs4

 

pip install requests

 

After installing all the libraries make an app.py file and copy paste the following code

 

app.py

 

 

import requests
import sys
from bs4 import BeautifulSoup
from urllib.parse import urlparse
from urllib.parse import urljoin

searched_links = []
broken_links = []

def getLinksFromHTML(html):
    def getLink(el):
        return el["href"]
    return list(map(getLink, BeautifulSoup(html, features="html.parser").select("a[href]")))

def find_broken_links(domainToSearch, URL, parentURL):
    if (not (URL in searched_links)) and (not URL.startswith("mailto:")) and (not ("javascript:" in URL)) and (not URL.endswith(".png")) and (not URL.endswith(".jpg")) and (not URL.endswith(".jpeg")):
        try:
            requestObj = requests.get(URL);
            searched_links.append(URL)
            if(requestObj.status_code == 404):
                broken_links.append("BROKEN: link " + URL + " from " + parentURL)
                print(broken_links[-1])
            else:
                print("NOT BROKEN: link " + URL + " from " + parentURL)
                if urlparse(URL).netloc == domainToSearch:
                    for link in getLinksFromHTML(requestObj.text):
                        find_broken_links(domainToSearch, urljoin(URL, link), URL)
        except Exception as e:
            print("ERROR: " + str(e));
            searched_links.append(domainToSearch)

find_broken_links(urlparse(sys.argv[1]).netloc, sys.argv[1], "")

print("\n--- DONE! ---\n")
print("The following links were broken:")

for link in broken_links:
    print ("\t" + link)

 

See also  Python Tkinter Password List Manager with Login and Registration Using Sqlite3 and Cryptography Libraries Full Project For Beginners

 

Now in order to run this python script by executing the` below command

 

 

python app.py ##yourdomainname###

 

This becomes

 

python app.py https://freemediatools.com

 

And if you execute the above command it will check all the urls and print to the console like this

 

 

 

 

And now at last it will show all the broken links to you as shown below

 

 

 

Leave a Reply