Preparation
Before any data manipulation can occur, two (2) new libraries will require installation.
- The Pandas library enables access to/from a DataFrame.
- The Openpyxl library enables conversion to/from Excel.
To install these libraries, navigate to an IDE terminal. At the command prompt ($
), execute the code below. For the terminal used in this example, the command prompt is a dollar sign ($
). Your terminal prompt may be different.
$ pip install pandas
Hit the <Enter> key on the keyboard to start the installation process.
$ pip install openpyxl
Hit the <Enter> key on the keyboard to start the installation process.
If the installations were successful, a message displays in the terminal indicating the same.
Feel free to view the PyCharm installation guide for the required libraries.
Add the following code to the top of each code snippet. This snippet will allow the code in this article to run error-free.
import pandas as pd import openpyxl
DataFrame.to_sql()
The to_sql()
method writes rows (records) from a DataFrame to a SQL database.
The syntax for this method is as follows.
DataFrame.to_sql(name, con, schema=None, if_exists='fail', index=True, index_label=None, chunksize=None, dtype=None, method=None)
Parameter | Description |
---|---|
name | This parameter is the name of the SQL table. |
con | This parameter is the database connections details. |
schema | Specify the database schema. |
if_exists | This parameter determines what to do if the specified table exists. The options are: fail, replace, or append. |
index | This parameter signifies the column to be set as the index. |
index_label | This parameter is the column name of the index. |
chunksize | Specify the number of rows to batch add in each round. |
dtype | Specify the data type for each column. Dictionary or scalar. |
method | This parameter controls the insertion clause. Options are: – None : use the SQL INSERT statement.– Multi : pass multiple values to a single INSERT statement– Callable: requires a sample implementation statement |
This example reads in the countries.csv
file. These records are then inserted into a Database using sqlite3. To follow along, click here to save this CSV file and move it to the current working directory.
conn = sqlite3.connect('countries_database') c = conn.cursor() c.execute('CREATE TABLE IF NOT EXISTS countries(Country text, Capital text, Population integer, Area integer)') conn.commit() df = pd.read_csv('countries.csv') df.to_sql('countries', conn, if_exists='replace', index=False) c.execute('SELECT * FROM countries') for row in c.fetchall(): print (row)
- Line [1] creates a connection to the countries_database.
- Line [2] executes the code on Line [1] and creates the Database.
- Line [3] contains SQL code to create a database table containing the specified fields.
- Line [4] executes the code on Line [3] and creates the table.
- Line [5] reads in the
countries.csv
file to the DataFramedf
. - Line [6] converts the DataFrame to SQL and inserts all records into the Database.
- Line [7] executes the
SQL SELECT
command to retrieve all records from the Database. - Line [8-9] instantiates a for loop and displays one (1) record/line from the Database.
Output
(‘Germany’, ‘Berlin’, 83783942, 357021) |
(‘France’, ‘Paris’, 67081000, 551695) |
(‘Spain’, ‘Madrid’, 47431256, 498511) |
(‘Italy’, ‘Rome’, 60317116, 301338) |
(‘Poland’, ‘Warsaw’, 38383000, 312685) |
(‘Russia’, ‘Moscow’, 146748590, 17098246) |
(‘USA’, ‘Washington’, 328239523, 9833520) |
(‘China’, ‘Beijing’, 1400050000, 9596961) |
(‘India’, ‘Dheli’, 1352642280, 3287263) |
π‘Β Note: For additional details on SQL commands, click here.
More Pandas DataFrame Methods
Feel free to learn more about the previous and next pandas DataFrame methods (alphabetically) here:
Also, check out the full cheat sheet overview of all Pandas DataFrame methods.