5 Best Ways to Create a SQLite Database from CSV with Python

πŸ’‘ Problem Formulation: You have a CSV file containing data that needs to be transferred into a queryable, efficient, and durable format like a SQLite database. Your input is a CSV file, such as data.csv, which includes columns and rows of data. The desired output is a SQLite database file, such as data.db, with a table that reflects the structure and content of the original CSV file.

Method 1: Using Pandas and to_sql()

Pandas is a powerful data analysis library that simplifies many data operations. By reading the CSV into a DataFrame and then using the to_sql() function, Python can export data directly into a SQLite database. This method is efficient for data manipulation before insertion and is great for large datasets.

Here’s an example:

import pandas as pd
from sqlalchemy import create_engine

# Read CSV file into DataFrame
df = pd.read_csv('data.csv')

# Create SQLite engine and save to a database file
engine = create_engine('sqlite:///data.db')
df.to_sql('my_table', con=engine, if_exists='replace', index=False)

The output will be a SQLite database file named data.db with a table named my_table containing the CSV data.

This snippet creates a connection to a SQLite database using SQLAlchemy’s create_engine(). Then it reads a CSV file into a Pandas DataFrame and calls to_sql() to save the DataFrame to the SQLite database, replacing any existing table with the same name.

Method 2: Using SQLite3 and csv.reader()

SQLite3 is a built-in Python module specifically designed to work with SQLite databases. This method involves manually creating the database and table schema and then inserting rows after reading the CSV file with the built-in csv.reader(). It is highly customizable and allows finer control over the database and table creation.

Here’s an example:

import csv
import sqlite3

# Establish a connection to the database
conn = sqlite3.connect('data.db')
cursor = conn.cursor()

# Create a table with the appropriate schema
cursor.execute('CREATE TABLE IF NOT EXISTS my_table (id INTEGER, name TEXT, age INTEGER)')

# Read CSV and insert rows into the table
with open('data.csv', 'r') as file:
    csv_reader = csv.reader(file)
    for row in csv_reader:
        cursor.execute('INSERT INTO my_table VALUES (?, ?, ?)', row)

# Save (commit) the changes and close the connection
conn.commit()
conn.close()

The output will be a SQLite database named data.db with a custom table my_table, filled with data from data.csv.

The code reads a CSV file using csv.reader(), creates a connection to a SQLite database file, defines the table schema, then inserts each row of the CSV into the SQLite table. It’s an explicit method that can be advantageous for complex schemas or preprocessing before database insertion.

Method 3: Using the sqlite3 module and pandas’s DataFrame.iterrows()

The sqlite3 module can be used in conjunction with the pandas library. After loading the CSV file into a DataFrame, you can iterate over the DataFrame rows using iterrows() and perform insertions, offering a balance between the dataframe manipulation and the more explicit database operations provided by SQLite3.

Here’s an example:

import pandas as pd
import sqlite3

# Load CSV file into DataFrame
df = pd.read_csv('data.csv')

# Connect to SQLite database
conn = sqlite3.connect("data.db")

# Create a table
df[:0].to_sql("my_table", conn, if_exists='replace', index=False)

# Iterate over DataFrame rows and insert into table
for index, row in df.iterrows():
    conn.execute("INSERT INTO my_table VALUES (?, ?, ?)", row)

# Commit and close
conn.commit()
conn.close()

The output is a SQLite database called data.db, with table data inserted row by row from the DataFrame.

This code snippet demonstrates how to load a CSV file into a Pandas DataFrame, create the SQLite table schema from the DataFrame structure, and then iteratively insert each DataFrame row into the SQLite table. This method provides granular control over insertion and can be useful when additional row-level processing is needed.

Method 4: Using csvkit utility

csvkit is a suite of command-line tools for converting to and working with CSV. The utility csvsql can be used to create a SQLite database directly from the command line. This method is concise and does not require writing a full Python script, making it ideal for quick and simple conversions.

Here’s an example:

!csvsql --db sqlite:///data.db --tables my_table --insert data.csv

The output will be the creation of data.db with the data from data.csv inserted into a new table my_table.

The csvkit csvsql utility reads the CSV file and performs the table creation and data insertion in a single command. It’s efficient for simple tasks, but the lack of customization can be limiting for more complex scenarios that require additional data manipulation.

Bonus One-Liner Method 5: Pandas oneliner with sqlite3

For those seeking utmost simplicity, it is possible to combine the power of pandas and sqlite3 in a one-liner that reads a CSV and directly inserts it into a SQLite database. This method is perfect for quick and dirty data dumps without the need for preprocessing or additional database configuration.

Here’s an example:

pd.read_csv('data.csv').to_sql('my_table', sqlite3.connect('data.db'), if_exists='replace', index=False)

The result is an instantly created and populated my_table in the database data.db.

This code leverages the convenience of Pandas to read the CSV into a DataFrame and immediately saves it to a SQLite database with to_sql(). It uses a connection established through the sqlite3 module but is limited in that it does not allow fine control over the process.

Summary/Discussion

  • Method 1: Using Pandas and to_sql(). Strengths: Ideal for large datasets and data manipulation before insertion. Weaknesses: Requires Pandas and SQLAlchemy dependencies.
  • Method 2: Using SQLite3 and csv.reader(). Strengths: Finer control over database and schema creation, no external dependencies needed. Weaknesses: More verbose and manual process.
  • Method 3: Using sqlite3 and pandas’s DataFrame.iterrows(). Strengths: Balance between DataFrame manipulation and explicit database operations. Weaknesses: Potentially less efficient for large datasets due to row-wise insertion.
  • Method 4: Using csvkit utility. Strengths: Quick and simple command-line operation for straightforward conversions. Weaknesses: Lack of control over the conversion and processing process.
  • Method 5: Pandas one-liner with sqlite3. Strengths: Maximum simplicity for rapid database creation. Weaknesses: Limited customization and processing capabilities.