Pandas DataFrame to_sql() Method


Preparation

Before any data manipulation can occur, two (2) new libraries will require installation.

  • The Pandas library enables access to/from a DataFrame.
  • The Openpyxl library enables conversion to/from Excel.

To install these libraries, navigate to an IDE terminal. At the command prompt ($), execute the code below. For the terminal used in this example, the command prompt is a dollar sign ($). Your terminal prompt may be different.

$ pip install pandas

Hit the <Enter> key on the keyboard to start the installation process.

$ pip install openpyxl

Hit the <Enter> key on the keyboard to start the installation process.

If the installations were successful, a message displays in the terminal indicating the same.


Feel free to view the PyCharm installation guide for the required libraries.


Add the following code to the top of each code snippet. This snippet will allow the code in this article to run error-free.

import pandas as pd
import openpyxl

DataFrame.to_sql()

The to_sql() method writes rows (records) from a DataFrame to a SQL database.

The syntax for this method is as follows.

DataFrame.to_sql(name, con, schema=None, if_exists='fail', index=True, index_label=None, chunksize=None, dtype=None, method=None)
ParameterDescription
nameThis parameter is the name of the SQL table.
conThis parameter is the database connections details.
schemaSpecify the database schema.
if_existsThis parameter determines what to do if the specified table exists.
The options are: fail, replace, or append.
indexThis parameter signifies the column to be set as the index.
index_labelThis parameter is the column name of the index.
chunksizeSpecify the number of rows to batch add in each round.
dtypeSpecify the data type for each column. Dictionary or scalar.
methodThis parameter controls the insertion clause. Options are:
None: use the SQL INSERT statement.
Multi: pass multiple values to a single INSERT statement
Callable: requires a sample implementation statement

This example reads in the countries.csv file. These records are then inserted into a Database using sqlite3. To follow along, click here to save this CSV file and move it to the current working directory.

conn = sqlite3.connect('countries_database')
c = conn.cursor()

c.execute('CREATE TABLE IF NOT EXISTS countries(Country text, Capital text, Population integer, Area integer)')
conn.commit()

df = pd.read_csv('countries.csv')
df.to_sql('countries', conn, if_exists='replace', index=False)
 
c.execute('SELECT * FROM countries')

for row in c.fetchall():
    print (row)
  • Line [1] creates a connection to the countries_database.
  • Line [2] executes the code on Line [1] and creates the Database.
  • Line [3] contains SQL code to create a database table containing the specified fields.
  • Line [4] executes the code on Line [3] and creates the table.
  • Line [5] reads in the countries.csv file to the DataFrame df.
  • Line [6] converts the DataFrame to SQL and inserts all records into the Database.
  • Line [7] executes the SQL SELECT command to retrieve all records from the Database.
  • Line [8-9] instantiates a for loop and displays one (1) record/line from the Database.

Output

(‘Germany’, ‘Berlin’, 83783942, 357021)
(‘France’, ‘Paris’, 67081000, 551695)
(‘Spain’, ‘Madrid’, 47431256, 498511)
(‘Italy’, ‘Rome’, 60317116, 301338)
(‘Poland’, ‘Warsaw’, 38383000, 312685)
(‘Russia’, ‘Moscow’, 146748590, 17098246)
(‘USA’, ‘Washington’, 328239523, 9833520)
(‘China’, ‘Beijing’, 1400050000, 9596961)
(‘India’, ‘Dheli’, 1352642280, 3287263)

πŸ’‘Β Note: For additional details on SQL commands, click here.

More Pandas DataFrame Methods

Feel free to learn more about the previous and next pandas DataFrame methods (alphabetically) here:

Also, check out the full cheat sheet overview of all Pandas DataFrame methods.