Background & Preparation
Part 3 of this series centers around creating a MySQL table and inserting records from the CSV file referenced in Part 2. Working with MySQL is a must-have if you are interested in pursuing a career as a Data Scientist.
After completing Part 2, you should be comfortable:
- Connecting to a Localhost/Server
- Writing and executing a simple query to create a database
- Closing any open localhost/server connections
π‘ Note: At the end of Part 2, we saved our new DataFrame to crimes_01.csv. We will be reading in and accessing this same file in this article.
Add the following code to the top of each code snippet. This snippet will allow the code in this article to run error-free.
import pandas as pd import mysql.connector from mysql.connector import Error file_name = 'crimes_01.csv' try: connection = mysql.connector.connect( host = 'localhost', user = 'username', passwd = 'password') except Error as e: print(f"The error '{e}' occurred")
π‘Β Note: Modify the code above with the appropriate host and login credentials to ensure a successful connection. This snippet will allow the code to run error-free.
Create USE Query
Part 2 covered creating a MySQL database. A table will be created and attached to this database (crimestats2019).
Before adding a table, a database first needs to be selected. This step lets MySQL know which database to associate with a table. To select a database, run the code below.
query = 'USE crimestats2019;' cursor = connection.cursor() try: cursor.execute(query) connection.commit() print('Database selection was successful.') except Error as e: print(f"The error '{e}' occurred")
- Line [1] creates a query statement to select the database.
- Line [2] then creates a cursor to reference the connection.
- Line [3] initiates the try statement.
- Line [4] attempts to execute the query statement on line [9].
- Line [5] if successful, the connection commits the query (select the database).
- Line [6] outputs a message to the terminal.
- Line [7] is the except statement. If the connection fails, the code falls here.
- Line [8] outputs the error to the terminal.
Output
Database selection was successful.
Create MySQL Database Table
With a successful connection and the database selected, we can move forward. Letβs start by opening the CSV file crimes_01.csv to assess the structure.
Navigate to the current working directory for this example and double-click to open the CSV in Excel. Below, find the firsts few lines of the crimes_01.csv file.
The next step is to create a query that adds a table to the database selected earlier. Running the code below will create a new table. Review the structure. For simplicity, all strings, except for the Primary Key are assigned the data type VARCHAR (string).
query = 'USE crimestats2019;' cursor = connection.cursor() try: cursor.execute(query) connection.commit() print('The crimestats2019 database was selected') except Error as e: print(f"The error '{e}' occurred") query = ''' CREATE TABLE statsdata ( ID int NOT NULL AUTO_INCREMENT, Record_ID VARCHAR(10), Offense_Code VARCHAR(4), Police_District VARCHAR(3), Beat VARCHAR(3), Grid VARCHAR(4), Occurence_Date VARCHAR(10), Occurence_Time VARCHAR(5), PRIMARY KEY (ID)); ''' try: cursor.execute(query) connection.commit() print('The statsdata Table was created') except Error as e: print(f"The error '{e}' occurred")
π‘ Note: To learn more about the available data types, click here.
Log in to the localhost/server to confirm the above code worked. The database and associated Table should now be visible on the left-hand side.
Click the Structure link to view the structure.
The table structure on the right-hand side matches up with our code above.
Insert Records into MySQL Table
The final step is to insert the records from the CSV into the Table.
π‘ Note: With the Required Start Code added in, you should be successfully connected to a localhost/server and have selected the appropriate database.
The next step is to read in the CSV file crimes_01.csv and assign it to a DataFrame.
There are a few changes from the DataFrame code created earlier:
- The
cols
list now contains theRecord_ID
. - The
index_cols
parameter is removed fromread_csv()
. This is not needed as we have created a new index (auto-increment) field calledID
. - The DataFrame is converted to a list.
cols = ['Record_ID', 'Offense_Code', 'Police_District', 'Beat', 'Grid', 'Occurence_Date', 'Occurence_Time'] df = pd.read_csv(file_name, sep=',', usecols=cols, parse_dates=['Occurence_Date']) data_list = df.values.tolist()
A sample row should now look like this:
[1359411, 1315, '3', '3B', '0736', Timestamp('2019-01-01 08:58:00+0000', tz='UTC'), '08:58']
Now we are ready to create the INSERT INTO query!
- In MySQL, there are two parts to inserting a record. The first part is the
INSERT INTO
statement. This statement is needed to let MySQL know what fields to expect in the query.
q_hdr = 'INSERT INTO statsdata (' for f in df.columns: q_hdr += f + ',' q_hdr = q_hdr[0:-1] + ') '
Output
INSERT INTO statsdata (Record_ID,Offense_Code,Police_District,Beat,Grid,Occurence_Date,Occurence_Time)
The above line is called and appended to the second part of the query each time a query gets executed.
The second part is the VALUES statement. This part takes the data from the rows and columns and creates a VALUES query.
The code below looks daunting, but try and analyze every line and understand what it is doing.
for r in data_list: q_body = 'VALUES (' c_count = 0 for c in r: if c_count >= 0 and c_count <= 4: q_body += '"' + str(data_list[r_count][c_count]) + '",' elif c_count == 5: q_body += '"' + str(data_list[r_count][c_count])[0:10] + '",' else: q_body += '"' + str(data_list[r_count][c_count]) + '");' c_count += 1 try: print(q_hdr+q_body) cursor.execute(q_hdr+q_body) connection.commit() except: print('Error') exit() r_count += 1 connection.close()
Output
Below is an example of one complete insert statement from the above lines of code. You will see that inside the try statement, cursor.execute(q_hdr+q_body) combines the two to create a full query for each record.
INSERT INTO statsdata (Record_ID,Offense_Code,Police_District,Beat,Grid,Occurence_Date,Occurence_Time) VALUES ("1359411","1315","3","3B","736","2019-01-01", "08:58");
The final step is to confirm that the above records were added. We can do this by navigating to the localhost/server and entering the appropriate host and login credentials.
Once you have reached the phpMyAdmin page, click to select the appropriate database and Table. The right-side refreshes to display the records inserted from the code above.
Summary
In this article, we learned how to:
- Create a USE Query
- Create MySQL Database Table
- Insert Records into the Table