Python 3 Web Scraping Script to Scrape ICC Players Rankings in CSV Files Using BeautifulSoup4 and Pandas Library Full Project For Beginners



Welcome folks today in this blog post we will be scraping icc players rankings as csv files using beautifulsoup4 and pandas library in python 3. All the full source code of the application is shown below.




Get Started




In order to get started you need to install the below libraries using the pip command as shown below



pip install bs4



pip install pandas



After installing this library make an file and copy paste the following code



# -*- coding: utf-8 -*-
import requests
from bs4 import BeautifulSoup
import re
import pandas as pd

headers = {
    "User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.100 Safari/537.36"

urls = [

final_result_file_name = "All Ranking List.csv"
final_column_names = ["Ranking Type", "Position", "Player Name", "Team Name", "Rating", "Career Best Rating", "Crawl URL"]
pd.DataFrame(columns=final_column_names).to_csv(final_result_file_name, sep="\t", index=False, encoding="utf-8")

for url in urls:
    request_object = requests.get(url, headers=headers)
    html_content = request_object.text
    print(request_object.status_code, "->", url)
    soup_object = BeautifulSoup(html_content, "lxml")
    for element in'[class="ranking-pos up"], [class="ranking-pos down"]'):
        element.replace_with(BeautifulSoup("<" + + "></" + + ">", "html.parser"))

    ranking_type = soup_object.select_one(".rankings-block__title-container > h4").text

    result_file_name = ranking_type + ".csv"
    column_names = ["Position", "Player Name", "Team Name", "Rating", "Career Best Rating", "Crawl URL"]
    pd.DataFrame(columns=column_names).to_csv(result_file_name, sep="\t", index=False, encoding="utf-8")

    for element in'table[class="table rankings-table"] tr'):
        data_dict = dict()
        data_dict["Crawl URL"] = url
        data_dict["Ranking Type"] = ranking_type
            data_dict["Position"] = element.select_one('[class*="position"]').text
        for player_name in ('a[href*="/player-rankings"]')):
                data_dict["Player Name"] = player_name.text
            data_dict["Team Name"] = element.select_one('[class^="flag-15"]')["class"][-1]
            data_dict["Rating"] = element.select_one('[class$="rating"]').text
            data_dict["Career Best Rating"] = element.select_one('td.u-hide-phablet').text
        for key in data_dict.keys():
            data_dict[key] = re.sub(r"\s+", " ", data_dict[key])
            data_dict[key] = data_dict[key].strip()
        pd.DataFrame([data_dict], columns=column_names).to_csv(result_file_name, sep="\t", index=False, header=False, encoding="utf-8", mode="a")
        pd.DataFrame([data_dict], columns=final_column_names).to_csv(final_result_file_name, sep="\t", index=False, header=False, encoding="utf-8", mode="a")


See also  Python 3 Script to Find GCD or HCF of two Numbers Using For Loop Full Project For Beginners



Leave a Reply