Python 3 Web Scraping Script to Scrape ICC Country Matches (Future Tour Programme) in CSV Files Using BeautifulSoup4 and Pandas Library Full Project For Beginners





Welcome folks today in this blog post we will be scraping icc country matches (future tour programme) as csv files using beautifulsoup4 and pandas library in python 3. All the full source code of the application is shown below.




Get Started




In order to get started you need to install the below libraries using the pip command as shown below



pip install bs4



pip install pandas



After installing this library make an file and copy paste the following code



# -*- coding: utf-8 -*-
import requests
from bs4 import BeautifulSoup
import re
import pandas as pd

headers = {
    "User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.100 Safari/537.36"

urls = [

result_file_name = "Schedule List.csv"
column_names = ["Type", "Match Type", "Name", "Home Team Name", "Away Team Name", "Start Date & Time", "End Date & Time", "Venue", "City", "Crawl URL"]
pd.DataFrame(columns=column_names).to_csv(result_file_name, sep="\t", index=False, encoding="utf-8")

for url in urls:
    request_object = requests.get(url, headers=headers)
    html_content = request_object.text
    print(request_object.status_code, "->", url)
    soup_object = BeautifulSoup(html_content, "lxml")
    type_ = soup_object.select_one('').text
    if("women" in type_.lower()):
        type_ = "Women's"
        type_ = "Men's"
    for element in'[class="js-matchlist"] > .match-block'):
        data_dict = dict()
        data_dict["Type"] = type_
        data_dict["Crawl URL"] = url
        data_dict["Match Type"] = element.select_one('[class="match-block__type"]').text
        data_dict["Start Date & Time"] = element.select_one('[data-startdate]')["data-startdate"]
        data_dict["End Date & Time"] = element.select_one('[data-enddate]')["data-enddate"]
        if("away" in element.select_one('.match-block__team:first-child')):
            data_dict["Away Team Name"] = element.select_one('.match-block__team:first-child').text
            data_dict["Home Team Name"] = element.select_one('.match-block__team:nth-child(3)').text
            data_dict["Home Team Name"] = element.select_one('.match-block__team:first-child').text
            data_dict["Away Team Name"] = element.select_one('.match-block__team:nth-child(3)').text
        temp_data = element.select_one('.match-block__summary').text.split("|")
        data_dict["Name"] = temp_data[0].split(",")[-1]
        data_dict["Venue"], data_dict["City"] = temp_data[-1].split(",")
        for key in data_dict.keys():
            data_dict[key] = re.sub(r"\s+", " ", data_dict[key])
            data_dict[key] = data_dict[key].strip()
        pd.DataFrame([data_dict], columns=column_names).to_csv(result_file_name, sep="\t", index=False, header=False, encoding="utf-8", mode="a")


See also  Python 3 Script to Export a Pandas DataFrame to an Excel File Full Project For Beginners



Leave a Reply