17 Ways to Read a CSV File to a Pandas DataFrame

💬 Question: How to import a CSV file to a Pandas DataFrame in Python?

This article will discuss the most interesting examples to read a CSV file to a Pandas DataFrame.

If not specified otherwise, we use the following CSV file for all examples:

my_file.csv:

Name,Job,Age,Income
Alice,Programmer,23,110000
Bob,Executive,34,90000
Carl,Sales,45,50000

Let’s get started!

Example 1 – CSV to DataFrame

To import a given CSV file into a newly-created Pandas DataFrame, use the pd.read_csv('my_file.csv') function that returns a DataFrame created with the content in the CSV file 'my_file.csv'.

Here’s a quick and generic code snippet showcasing this approach:

import pandas as pd
df = pd.read_csv('my_file.csv')
print(df)

Output:

    Name         Job  Age  Income
0  Alice  Programmer   23  110000
1    Bob   Executive   34   90000
2   Carl       Sales   45   50000

You convert the CSV file to a DataFrame in three steps: (1) import the pandas library, (2) use pd.read_csv() and pass the filename as a string argument, and (3) print the resulting DataFrame (optional).

import pandas as pd
df = pd.read_csv('my_file.csv')
print(df)

Example 2 – Importing Specific Column from CSV

If you want to import a specific column in the CSV, use the argument usecols=['column_name'].

The following example only reads the column 'Name' using this approach:

import pandas as pd
df = pd.read_csv('my_file.csv', usecols= ['Name'])
print(df)

Output:

    Name
0  Alice
1    Bob
2   Carl

Example 3 – Importing Subset of Columns from CSV

If you want to import a subset of columns from the CSV, use the argument usecols= ['column_name_1','column_name_2'].

import pandas as pd
df = pd.read_csv('my_file.csv', usecols= ['Name','Age'])
print(df)

Output:

    Name  Age
0  Alice   23
1    Bob   34
2   Carl   45

Example 4 – Using Custom Separator for CSV File

You can specify a custom separator for the CSV file by adding the optional argument sep.

To use the tabular separator, use sep='\t' as an argument for pd.read_csv().
To use the semicolon separator, use sep=';' as an argument for pd.read_csv().
The default separator argument of pd.read_csv() is sep=','.

import pandas as pd
df = pd.read_csv('my_file.csv', sep='\t')
print(df)

If the tabular character was the separator in the file 'my_file.csv', this example will generate the same DataFrame as already shown in example 1:

    Name         Job  Age  Income
0  Alice  Programmer   23  110000
1    Bob   Executive   34   90000
2   Carl       Sales   45   50000

Example 5 – Read CSV with Custom Headers

Per default, the first row of the CSV file will be used as a header of the DataFrame by the df.read_csv() function.

To read a CSV to a DataFrame with custom headers, call pd.read_csv("my_file.csv", skiprows=1, names=column_names).

skiprows=1 ensures that you’ll skip the first row that contains the default column names. We’ll explore this in more detail below.
names=column_names passes the column names stored in the variable as a list ['Spouse', 'Former Job', 'Insta Followers', 'Insta Likes']. These are the new column names (=headers) of the newly-created DataFrame.

Here’s an example:

import pandas as pd
column_names = ['Spouse', 'Former Job', 'Insta Followers', 'Insta Likes']
df = pd.read_csv("my_file.csv", skiprows=1, names=column_names)
print(df)

Output:

  Spouse  Former Job  Insta Followers  Insta Likes
0  Alice  Programmer               23       110000
1    Bob   Executive               34        90000
2   Carl       Sales               45        50000

Example 6 – Skip First n Rows from CSV using Custom Headers

import pandas as pd

n = 2  # no. of rows to skip
column_names = ['Spouse', 'Former Job', 'Insta Followers', 'Insta Likes']
df = pd.read_csv("my_file.csv", skiprows=2, names=column_names)

print(df)

Output:

  Spouse Former Job  Insta Followers  Insta Likes
0    Bob  Executive               34        90000
1   Carl      Sales               45        50000

The first row (=original header) and the second row (=first line of CSV) are now skipped when setting skiprows=2.

Example 7 – Read CSV with Header in i-th Row

To read the CSV file with the header in the i-th row, you can pass the row index i as header argument like so: pd.read_csv("my_file.csv", header = i). The i-th row will now be taken as the column headers of the newly-created DataFrame.

The following example shows how to read a CSV to a DataFrame with the header in the second row with index i=1:

import pandas as pd

i = 1    # header row index
df = pd.read_csv("my_file.csv", header = i)

print(df)

Output:

  Alice Programmer  23  110000
0   Bob  Executive  34   90000
1  Carl      Sales  45   50000

In this example, the second row of the CSV is taken as a header. The first row is ignored altogether.

Example 8 – Read CSV Without Header

If you specify header=None when calling pd.read_csv("my_file.csv", header=None), Python will assign successive integer values (0, 1, 2, …) as column names.

import pandas as pd
df = pd.read_csv("my_file.csv", header=None)
print(df)

Output:

       0           1    2       3
0   Name         Job  Age  Income
1  Alice  Programmer   23  110000
2    Bob   Executive   34   90000
3   Carl       Sales   45   50000

Example 9 – Read CSV Custom Header Names: Prefix

If you specify header=None when calling pd.read_csv("my_file.csv", header=None), Python will assign successive integer values (0, 1, 2, …) as column names.

If you want to modfiy the column names in a batch, you can additionally specify the prefix argument, i.e., a prefix string that will be inserted before each successive integer column name.

For example, prefix='Col_' in combination with header=None will yield the column names:

'Col_0'
'Col_1'
'Col_2'
…

Here’s the code specification:

import pandas as pd
df = pd.read_csv("my_file.csv", header=None, prefix='Col_')
print(df)

Output:

   Col_0       Col_1 Col_2   Col_3
0   Name         Job   Age  Income
1  Alice  Programmer    23  110000
2    Bob   Executive    34   90000
3   Carl       Sales    45   50000

Example 10 – Set Specific Values as NaN (Missing Value)

If you want to set specific values in the CSV as “Not a Number” (NaN), you can use the list argument na_values=[val_1, ...] that has an element for each value to be set as NaN when loading the CSV into a DataFrame.

Here’s an example setting the values 'Alice' and 23 to NaN.

import pandas as pd
df = pd.read_csv("my_file.csv", na_values=['Alice', 23])
print(df)

Output:

   Name         Job   Age  Income
0   NaN  Programmer   NaN  110000
1   Bob   Executive  34.0   90000
2  Carl       Sales  45.0   50000

The missing values are highlighted in bold.

Example 11 – Load CSV at URL to DataFrame

Given the string URL with the location of the desired CSV file (e.g., "https://www.my_url.com/my_file.csv"). You can load the contents into a DataFrame by passing the URL as first argument of the pd.read_csv() function instead of the local file path.

Pandas will automagially figure out that it’s an URL. 😉

import pandas as pd
df = pd.read_csv("https://www.my_url.com/my_file.csv")
print(df)

Output — assuming the CSV file actually exists:

    Name         Job  Age  Income
0  Alice  Programmer   23  110000
1    Bob   Executive   34   90000
2   Carl       Sales   45   50000

Example 12 – Skip Last i Rows Loading CSV to DataFrame

You can skip the last i rows by setting the skipfooter=i argument in the pd.read_csv() function. The first row of the CSV will still be the header used for the column names. But the last i rows are ignored from the CSV.

Here’s an example:

import pandas as pd
df = pd.read_csv("my_file.csv", skipfooter=2)
print(df)

Output:

    Name         Job  Age  Income
0  Alice  Programmer   23  110000

The first row is still taken as header. But the last i=2 rows are skipped, so the remaining DataFrame has only one row.

Example 13 – Ignoring Comment Lines When Loading CSV to DataFrame

Assume you have a CSV file with some comments (e.g., starting with the hashtag character '#').

'my_file.csv':

Name,Job,Age,Income
Alice,Programmer,23,110000
# Ignore this line
Bob,Executive,34,90000
Carl,Sales,45,50000
# This line is ignored too

You can load the CSV and ignore lines that start with a specific “commenting character” such as the symbol '#' by passing the argument comment='#' into the pd.read_csv() function.

Here’s an example:

import pandas as pd
df = pd.read_csv("my_file.csv", comment="#")
print(df)

The output is the DataFrame with commented lines ignored:

    Name         Job  Age  Income
0  Alice  Programmer   23  110000
1    Bob   Executive   34   90000
2   Carl       Sales   45   50000

Without setting the comment argument, the DataFrame would attempt to load commented lines which can yield a very odd behavior in practical code projects:

                         Name         Job   Age    Income
0                       Alice  Programmer  23.0  110000.0
1          # Ignore this line         NaN   NaN       NaN
2                         Bob   Executive  34.0   90000.0
3                        Carl       Sales  45.0   50000.0
4  # This line is ignored too         NaN   NaN       NaN

The highlighted lines are clearly not the desired behavior, so use the comment argument accordingly to fix this!

Example 14 – Measure Elapsed Runtime when Importing CSV to DataFrame

The verbose=True argument of the pd.read_csv() function allows you to measure the elapsed runtime when loading a CSV file.

For small CSV files, the elapsed runtime will always yield 0 due to the high speed efficiency. But for performance-heavy projects or large-scale CSV files, this argument can make your life much easier!

import pandas as pd
df = pd.read_csv("my_file.csv", verbose=True)
print(df)

Output:

Tokenization took: 0.00 ms
Type conversion took: 0.00 ms
Parser memory cleanup took: 0.00 ms
    Name         Job  Age  Income
0  Alice  Programmer   23  110000
1    Bob   Executive   34   90000
2   Carl       Sales   45   50000

In our case, the elapsed runtime was real quick due to the small CSV file. But for larger, real-world CSV files, this can give you a great idea on your code’s bottlenecks for performance analysis.

Example 15 – Pandas DataFrame to CSV

Given a Pandas DataFrame. How to save it in a CSV file?

You can convert a list of lists to a Pandas DataFrame that provides you with powerful capabilities such as the to_csv() method. This is the easiest method and it allows you to avoid importing yet another library (I use Pandas in many Python projects anyways).

salary = [['Alice', 'Data Scientist', 122000],
          ['Bob', 'Engineer', 77000],
          ['Ann', 'Manager', 119000]]

import pandas as pd
df = pd.DataFrame(salary)
df.to_csv('file2.csv', index=False, header=False)

Output:

# file2.csv
Alice,Data Scientist,122000
Bob,Engineer,77000
Ann,Manager,119000

🌍 Learn More: You can learn more about this approach in our detailed Finxter blog article.

Example 16 – CSV to DataFrame Setting Specific Values as True

You can set specific values as True when loading the CSV file to a DataFrame by defining the true_values argument. Simply pass a list of values to be used as True values in the pd.read_csv() function.

For example, true_values=['Alice', 'Bob'] will set each occurrence of either 'Alice' or 'Bob' as True in the DataFrame.

import pandas as pd
df = pd.read_csv("my_file.csv", true_values=['Alice', 'Bob'])
print(df)

Output:

    Name         Job  Age  Income
0  True  Programmer   23  110000
1    True   Executive   34   90000
2   Carl       Sales   45   50000

Example 17 – CSV to DataFrame Setting Specific Values as False

You can set specific values as False when loading the CSV file to a DataFrame by defining the false_values argument. Simply pass a list of values to be used as False values in the pd.read_csv() function.

For example, false_values=['Alice', 'Bob'] will set each occurrence of either 'Alice' or 'Bob' as False in the DataFrame.

import pandas as pd
df = pd.read_csv("my_file.csv", false_values=['Alice', 'Bob'])
print(df)

Output:

    Name         Job  Age  Income
0  False  Programmer   23  110000
1    False   Executive   34   90000
2   Carl       Sales   45   50000

More Python CSV Conversions

🐍 Learn More: I have compiled an “ultimate guide” on the Finxter blog that shows you the best method, respectively, to convert a CSV file to JSON, Excel, dictionary, Parquet, list, list of lists, list of tuples, text file, DataFrame, XML, NumPy array, and list of dictionaries.

Example 1 – CSV to DataFrame

Example 2 – Importing Specific Column from CSV

Example 3 – Importing Subset of Columns from CSV

Example 4 – Using Custom Separator for CSV File

Example 5 – Read CSV with Custom Headers

Example 6 – Skip First n Rows from CSV using Custom Headers

Example 7 – Read CSV with Header in i-th Row

Example 8 – Read CSV Without Header

Example 9 – Read CSV Custom Header Names: Prefix

Example 10 – Set Specific Values as NaN (Missing Value)

Example 11 – Load CSV at URL to DataFrame

Example 12 – Skip Last i Rows Loading CSV to DataFrame

Example 13 – Ignoring Comment Lines When Loading CSV to DataFrame

Example 14 – Measure Elapsed Runtime when Importing CSV to DataFrame

Example 15 – Pandas DataFrame to CSV

Example 16 – CSV to DataFrame Setting Specific Values as True

Example 17 – CSV to DataFrame Setting Specific Values as False

Related Video

More Python CSV Conversions

Programmer Humor

Resources