Python Working with the Pandas DataFrame & MySQL – Part 2

Rate this post

Background

Part 2 of this series centers around running more complex MySQL queries. This skill is a must-have if you are interested in pursuing a career as a Data Scientist.

After completing Part 1 of this series, you should be comfortable:

  • Installing Libraries
  • Downloading and saving a CSV file from the web.
  • Creating a Pandas DataFrame
  • Retrieving Total Number of Records from a DataFrame
  • Querying a DataFrame
  • Sorting a DataFrame
  • Saving a DataFrame

Preparation

This article assumes you have completed the following from Part 1:

  • Installed all of the required plugins.
  • Saved crimes_01.csv to the current working directory.

Add the Required Starter Code to the top of each code snippet. This snippet will allow the code to run error-free.

import pandas as pd
import mysql.connector
from mysql.connector import Error

file_name = 'crimes_01.csv'

Read CSV File

For this article, we read in the crimes_01.csv file used in Part 1 of this series. This file contains all the modifications made to reduce the total DataFrame from 39,504 to 150 rows.

💡 Note: This code assumes the above CSV file resides in the current working directory.

df = pd.read_csv(file_name, index_col=['Record_ID'], parse_dates=['Occurence_Date'])
print(df.head(4))
  • Line [1] reads in the CSV file mentioned above, sets the index to Record_ID, and sets parse_dates to a list.
  • Line [2] outputs the first four lines of the file to the terminal.

Output

  Record_IDOffense_CodePolice_DistrictBeat Grid           Occurence_Date
1359460     531111A03022019-01-01 08:31:00+00:00
1359689           299911A01062019-01-02 03:31:00+00:00
1359713           220211A03442019-01-01 20:37:00+00:00
1359747           229911A03382019-01-02 21:00:00+00:00

Insert a DataFrame Column

In the Occurence_Date field, the Date and Time for each record are together. The code below adds a column to the existing DataFrame.

occur_time = []
for label, content in df.items():
    if label == 'Occurence_Date':
        for c in content:
            c = str(c)
            occur_time.append(c[11:16])

df.insert(5, 'Occurence_Time', occur_time, True)
df.to_csv('crimes_01.csv')

This code does the following:

  • Line [1] declares an empty list (occur_time).
  • Line [2] instantiates a for loop to access each item in the DataFrame.
  • Line [3] checks for the Occurence_Date column (label).
  • Line [4] loops through the contents of each Occurence_Date.
  • Line [5] converts the contents to a string to manipulate the data.
  • Line [6] strips the time from the Occurence_Date and appends this value to the list.
  • Line [7] adds a new column, inserts the values in the occur_time list, and sets allow duplicates to True.
  • Line [8] saves these changes.

Output (Partial)


Connect to MySQL

To connect to the MySQL database, the appropriate library requires installation. Click here for instructions.

This article also assumes you have access to a localhost/server with MySQL. You will also require login credentials with the appropriate permissions.

For this example, we will be connecting to localhost via phpMyAdmin.


Check the Login Credentials Manually

Before additional coding, it is good to test the login credentials live. In this regard, navigate to your localhost/server, enter the login credentials and click the Go button.

If the login is successful, a screen similar to the following will display. This view will vary based on the databases and tables listed on the left.


Check the Login Credentials via Python

For this example, the code below will connect using the appropriate login credentials mentioned earlier.

try:
    connection  = mysql.connector.connect(
        host    = 'localhost',
        user    = 'username',
        passwd  = 'password'
    )
except Error as e:
    print(f"The error '{e}' occurred")

print('Connected!')
  • Line [1] initiates the try statement.
  • Line [2] is the connection string used to attempt to connect to the localhost/server.
  • Line [3] is the except statement. If the connection fails, the code fails here.
  • Line [4] outputs the connection error that occurred.
  • Line [5] outputs Connected! if the connection was successful.

Note: if the connection was unsuccessful, you need to delve further to resolve this issue before continuing.


Create a MySQL Database

With the connection above still open, run the following code to create a new database on the localhost.

If a database with the same name exists, this code does not run.

query  = 'CREATE DATABASE IF NOT EXISTS crimestats2019; '
cursor = connection.cursor()

try:
    cursor.execute(query)
    connection.commit()
    print('Database was created')
except Error as e:
    print(f"The error '{e}' occurred")

connection.close()
  • Line [1] creates a MySQL query.
  • Line [2] creates a cursor to reference the connection.
  • Line [3] initiates the try statement.
  • Line [4] attempts to execute the query.
  • Line [5] if the execution is successful, the query commits.
  • Line [6] outputs the message 'Database was created' to the terminal.
  • Line [7] is the except statement. If the query fails, the code falls here.
  • Line [8] outputs an error message to the terminal.
  • Line [9] closes the open connection.

To confirm the database was created, navigate to the localhost/server (see instructions above). Refresh the view (F5) to see the crimestats2019 database.


Summary

In this article, we learned how to:

  • Read in a CSV File
  • Insert a DataFrame Column
  • Add data to a new Column
  • Log in to a Localhost/Server
  • Connect to MySQL
  • Execute a MySQL query
  • Create a MySQL Database

Next Up

Part 3 focuses on the following:

  • Creating a Use Query
  • Creating a MySQL Database Table
  • Inserting Records