Storing Scraped Data

[toc]

Introduction

After scraping huge chunks of data, you need to store them in a proper format. You may want to store the data directly into a text file, or you may opt to store it in a more structured way in a csv file or an excel sheet. You may otherwise want to store it directly in a database. Thus, it becomes crucial to understand how to store scraped data in a file or database.

Before you scrape or store data, it is essential to have a clear picture of how the data looks like. Probably storing a more structured data that is consistent ( following the same structure throughout) in a csv or excel would make more sense. Whereas if data being scraped from different pages are haphazard, it would be easier if you store such data in different files separately. So, we will unearth some of these ways in this tutorial and store the scraped data in different formats. Without further delay, let the storage begin.πŸ˜‰

Prerequisite

Before we store the data, we need to scrape it. We will be scraping data from the following website: https://webscraper.io/test-sites/e-commerce/allinone/computers/laptops

Data that we will scrape:

  • Product Name
  • Product Price
  • Product Description
  • Product Rating

❖ Disclaimer: This tutorial considers that you have the basic knowledge of web scraping with BeautifulSoup. The purpose of this article is to educate you on how to store scraped content from websites into a file or database. The examples and theories mentioned in this tutorial are solely for educational purposes, and it is considered that you will not misuse them. In case of any misuse, it is solely your responsibility, and we are not responsible for it. If you are interested in learning the basic concepts of web scraping before diving into this tutorial, please follow the lectures at this link.

The following code demonstrates how to scrape the necessary data from the above-mentioned website. Thus, before learning how to store the data, please go through the snippet to learn how to scrape the data.

# 1. Import the necessary LIBRARIES
import requests
from bs4 import BeautifulSoup

# 2. Create a User Agent (Optional)
headers = {"User-Agent": "Mozilla/5.0 (Linux; U; Android 4.2.2; he-il; NEO-X5-116A Build/JDQ39) AppleWebKit/534.30 ("
                         "KHTML, like Gecko) Version/4.0 Safari/534.30"}
# 3. Define Base URL
url = 'https://webscraper.io/test-sites/e-commerce/allinone/computers/laptops'

# 4. Send get() Request and fetch the webpage contents
response = requests.get(url, headers=headers)

# 5. Create a Beautiful Soup Object
soup = BeautifulSoup(response.content, 'html.parser')

# 6. The Scraping Logic
name = [name.text for name in soup.find_all('a', class_='title')]  # to store the product names
price = [price.text for price in soup.find_all('h4', class_='pull-right price')]  # to store the product prices
description = [desc.text for desc in soup.find_all('p', class_='description')]  # to store the product descriptions
rating = []
for rate in soup.find_all('div', class_='ratings'):  # to store the product ratings
    flag = 0
    for r in rate.find_all('span'):
        flag += 1
    rating.append(flag)

for i in range(len(name)):
    print(name[i])
    print(price[i])
    print(description[i])
    print(rating[i])
    print()

Video Walkthrough of Above Code

How To Store Scraped Data In A Text File?

Once you have already scraped the data, you can store the data in a text file. Let’s focus on the steps that will help us to do so:

  1. Create the text file and open it in your script. To ensure that special characters are properly extracted from the page, use the proper encoding standard.
  2. Store the data for each product in the file one by one with the help of a for loop and the write() method.
  3. Once the data for all the products have been successfully written to the file, close the file.

Code:

def Save_to_file():
    # create and open a file by the name data in write mode
    # encoding ensures that special characters are properly extracted from the page.
    f = open('data.txt', 'w', encoding="utf-8")
    # iterate through each product data stored in individual lists and store them one by one
    for i in range(len(name)):
        f.write(name[i] + "\n")
        f.write(price[i] + "\n")
        f.write(description[i] + "\n")
        f.write(str(rating[i])+"\n")
        f.write("=========================================================\n")
    # close the file once all the data has been stored in it.
    f.close()

How To Store Scraped Data In A CSV File?

You can use a more structured approach to store the data by storing it in a csv file.

Approach:

  • Define the headers for the columns of your csv and store them in a list.
  • Create a list and store the title, price, description, and rating of each product in the list.
  • Segregate the values for each product individually by slicing and storing them in another list.
  • Create and open a new csv file.
    • Note: You must import the csv module before you can use the csvwriter object to write to the csv file.
  • Once the file is created, store the data in the csv file with the help of the csvwriter object.

Code:

import csv
def Save_csv():
    row_head =['Title', 'Price', 'Description', 'Rating']
    Data = []
    for title, amount, desc, rate in zip(name, price, description, rating):
        Data.append(title)
        Data.append(amount)
        Data.append(desc)
        Data.append(rate)
    rows = [Data[i:i + 4] for i in range(0, len(Data), 4)]
    with open('data.csv', 'w', encoding='utf_8_sig', newline="") as csvfile:
        csvwriter = csv.writer(csvfile)
        csvwriter.writerow(row_head)
        csvwriter.writerows(rows)

Video Walkthrough 1

How To Store Scraped Data in SQL Database?

Disclaimer: It is assumed that you have some basic SQL knowledge as the purpose of this tutorial is not to educate you on SQL. Rather, the idea is to show you how you can integrate a web scraper into a database.

If the data to be stored is huge and requires more complex management, then opting for a database to store it would be the right choice. In this scenario, opting to store the scraped data in your SQL database by integrating the Python code with the database is would be an easy option. Hence, in this section, we will learn how to use Python to connect to the MySQL database and store the scraped data.

Approach:

  • Create a connection with the MySQL database.
    • We will be using the “MySQL Connector” driver in this tutorial to connect to the database. It is recommended that you use PIP to install it.
    • Open your terminal and run the following command: python -m pip install mysql-connector-python.
    • Use the username and password for your instance to connect to the database.
  • Once the connection has been established to your database, initiate a cursor object to create a new Table with the help of the CREATE TABLE SQL query.
  • Once the table is created, insert the scraped values in the created table one by one with the help of a loop.
    • Note that well are following a similar approach to writing to a CSV file, except for the fact that we will commit to a MySQL database. Also, it is crucial to close both the connection and the cursor.

Code:

def Store_to_mysql():
    # creating database and establishing connection with MySql
    connection = mysql.connector.connect(host='localhost',
                                         user='root',
                                         password='root')
    # creating database_cursor to perform SQL operation
    cursor = connection.cursor()
    cursor.execute("CREATE DATABASE data_db")
    print("Connection to MySQL Established!")
    insert_data()

def insert_data():
    # establish connection to database
    connection = mysql.connector.connect(host='localhost',
                                         user='root',
                                         password='root',
                                         database='data_db')
    print("Connected to Database!")
    cursor = connection.cursor()
    mySql_Create_Table_Query = """
                                CREATE TABLE data_table ( 
                                        title varchar(250) NOT NULL,
                                        price varchar(250) NOT NULL,
                                        description varchar(250) NOT NULL,
                                        rating varchar(250) NOT NULL)
                                        """
    result = cursor.execute(mySql_Create_Table_Query)
    insert_query = """INSERT INTO data_table (title, price, description, rating) 
                                                   VALUES (%s, %s, %s, %s) """
    for i in range(len(title)):
        records = (title[i], price[i], description[i], rating[i])
        cursor.execute(insert_query, records)
        connection.commit()
    
    # close cursor and connection
    cursor.close()
    connection.close()
    print("Connection Closed!")

Store_to_mysql()

Output:


Web Scraping with BeautifulSoup

One of the most sought-after skills on Fiverr and Upwork is web scraping .

Make no mistake: extracting data programmatically from web sites is a critical life-skill in today’s world that’s shaped by the web and remote work.

This course teaches you the ins and outs of Python’s BeautifulSoup library for web scraping.