Python Working with the Pandas DataFrame & MySQL – Part 3

Background & Getting Started

Part 3 of this series centers around creating a MySQL table and inserting records from the CSV file referenced in Part 2. Working with MySQL is a must-have if you are interested in pursuing a career as a Data Scientist.

After completing Part 2, you should be comfortable:

  • Connecting to a Localhost/Server
  • Writing and executing a simple query to create a database
  • Closing any open localhost/server connections

At the end of Part 2, we saved our new DataFrame to crimes_01.csv. For this article, we will be reading in and accessing this same file.

Add the Required Starter Code below to the top of each code snippet.  Modify the code below with the appropriate host and login credentials to ensure a successful connection. This snippet will allow the code to run error-free.

Required Starter Code:

import pandas as pd
import mysql.connector
from mysql.connector import Error

file_name = 'crimes_01.csv'

try:
    connection  = mysql.connector.connect(
        host       = 'localhost',
        user       = 'username',
        passwd  = 'password')
except Error as e:
    print(f"The error '{e}' occurred")

Create USE Query

Part 2 covered creating a MySQL database. In this section, a table will be created and attached to this database (crimestats2019).

Before adding a table, a database first needs to be selected.  This step lets MySQL know which database to associate with a table. To select a database, run the code below.

  • Line [9] creates a query statement to select the database.
  • Line [10] then creates a cursor to reference the connection.
  • Line [11] initiates the try statement.
  • Line [12] attempts to execute the query statement on line [9].
  • Line [13] if successful, the connection commits the query (select the database).
  • Line [14] outputs a message to the terminal.
  • Line [15] is the except statement. If the connection fails, the code falls here.
  • Line [16] outputs the error to the terminal.
query  = 'USE crimestats2019;'
cursor = connection.cursor()

try:
    cursor.execute(query)
    connection.commit()
    print('Database selection was successful.')
except Error as e:
    print(f"The error '{e}' occurred")

Output:

Database selection was successful.

Create MySQL Database Table

With a successful connection and the database selected, we can move forward. Let’s start by opening the CSV file crimes_01.csv to assess the structure.

For this example, navigate to the current working directory and double-click to open the CSV in Excel. Below, find the firsts few lines of the crimes_01.csv file.

The next step is to create a query that adds a table to the database selected earlier.  Running the code below will create a new table. Review the structure. For simplicity, all strings, except for the Primary Key are assigned the data type VARCHAR (string).

Code:

query  = 'USE crimestats2019;'
cursor = connection.cursor()

try:
    cursor.execute(query)
    connection.commit()
    print('The crimestats2019 database was selected')
except Error as e:
    print(f"The error '{e}' occurred")

query = '''
CREATE TABLE statsdata (
    ID int NOT NULL AUTO_INCREMENT,
    Record_ID VARCHAR(10),
    Offense_Code VARCHAR(4),
    Police_District VARCHAR(3),
    Beat VARCHAR(3),
    Grid VARCHAR(4),
    Occurence_Date VARCHAR(10),
    Occurence_Time VARCHAR(5),
    PRIMARY KEY (ID)); 
'''

try:
    cursor.execute(query)
    connection.commit()
    print('The statsdata Table was created')
except Error as e:
    print(f"The error '{e}' occurred")

Output:

To confirm the above code worked, log in to the localhost/server. On the left-hand side, the database and associated table should now be visible.

Click the Structure link to view the structure.

The table structure on the right-hand side matches up to our code above.

Insert Records into MySQL Table

The final step is to insert the records from the CSV into the Table.

Note: With the Required Start Code added in, you should be successfully connected to a localhost/server and have selected the appropriate database. 

The next step is to read in the CSV file crimes_01.csv and assign it to a DataFrame.

There are a few changes from the DataFrame code created earlier:

  • The cols list now contains the Record_ID.
  • The index_cols parameter is removed from read_csv(). This is not needed as we have created a new index (auto-increment) field called ID.
  • The DataFrame is converted to a list.

Code:

cols = ['Record_ID', 'Offense_Code', 'Police_District', 'Beat', 'Grid', 'Occurence_Date', 'Occurence_Time']

df = pd.read_csv(file_name,
                 sep=',',
                 usecols=cols,
                 parse_dates=['Occurence_Date'])

data_list = df.values.tolist()

A sample row should now look like this:

[1359411, 1315, '3', '3B', '0736', Timestamp('2019-01-01 08:58:00+0000', tz='UTC'), '08:58']

Now we are ready to create the INSERT INTO query!

  • In MySQL, there are two parts to inserting a record. The first part is the INSERT INTO statement. This statement is needed to let MySQL know what fields to expect in the query.

Code:

q_hdr = 'INSERT INTO statsdata ('
for f in df.columns:
    q_hdr += f + ','
q_hdr = q_hdr[0:-1] + ') '

Output:

INSERT INTO statsdata (Record_ID,Offense_Code,Police_District,Beat,Grid,Occurence_Date,Occurence_Time)

The above line is called and appended to the second part of the query each time a query gets executed.

The second part is the VALUES statement. This part takes the data from the rows and columns and creates a VALUES query.

The code below looks daunting, but try and analyze every line and understand what it is doing.

for r in data_list:
    q_body = 'VALUES ('
    c_count = 0
    for c in r:
        if c_count >= 0 and c_count <= 4:
            q_body += '"' + str(data_list[r_count][c_count]) + '",'
        elif c_count == 5: 
            q_body += '"' + str(data_list[r_count][c_count])[0:10] + '",'
        else:
            q_body += '"' +  str(data_list[r_count][c_count]) + '");'
        c_count += 1
    try:
        print(q_hdr+q_body)
        cursor.execute(q_hdr+q_body)
        connection.commit()
    except:
        print('Error')
        exit()
    r_count += 1

connection.close()

Output:

Below is an example of one complete insert statement from the above lines of code. You will see that inside the try statement, cursor.execute(q_hdr+q_body) combines the two to create a full query for each record.

INSERT INTO statsdata (Record_ID,Offense_Code,Police_District,Beat,Grid,Occurence_Date,Occurence_Time) VALUES ("1359411","1315","3","3B","736","2019-01-01", "08:58");

The final step in this process is to confirm the above records were added. We can do this by navigating to the localhost/server and entering the appropriate host and login credentials.

Once you have reached the phpMyAdmin page, click to select the appropriate database and table. The right-side refreshes to display the records inserted from the code above.

Summary

In this article, we learned how to:

  • Create a USE Query
  • Create MySQL Database Table
  • Insert Records into the Table