17 Ways to Read a CSV File to a Pandas DataFrame

5/5 - (4 votes)

💬 Question: How to import a CSV file to a Pandas DataFrame in Python?

This article will discuss the most interesting examples to read a CSV file to a Pandas DataFrame.

If not specified otherwise, we use the following CSV file for all examples:

my_file.csv:

Name,Job,Age,Income
Alice,Programmer,23,110000
Bob,Executive,34,90000
Carl,Sales,45,50000

Let’s get started!

Example 1 – CSV to DataFrame

To import a given CSV file into a newly-created Pandas DataFrame, use the pd.read_csv('my_file.csv') function that returns a DataFrame created with the content in the CSV file 'my_file.csv'.

Here’s a quick and generic code snippet showcasing this approach:

import pandas as pd
df = pd.read_csv('my_file.csv')
print(df)

Output:

    Name         Job  Age  Income
0  Alice  Programmer   23  110000
1    Bob   Executive   34   90000
2   Carl       Sales   45   50000

You convert the CSV file to a DataFrame in three steps: (1) import the pandas library, (2) use pd.read_csv() and pass the filename as a string argument, and (3) print the resulting DataFrame (optional).

  • import pandas as pd
  • df = pd.read_csv('my_file.csv')
  • print(df)

Example 2 – Importing Specific Column from CSV

If you want to import a specific column in the CSV, use the argument usecols=['column_name'].

The following example only reads the column 'Name' using this approach:

import pandas as pd
df = pd.read_csv('my_file.csv', usecols= ['Name'])
print(df)

Output:

    Name
0  Alice
1    Bob
2   Carl

Example 3 – Importing Subset of Columns from CSV

If you want to import a subset of columns from the CSV, use the argument usecols= ['column_name_1','column_name_2'].

import pandas as pd
df = pd.read_csv('my_file.csv', usecols= ['Name','Age'])
print(df)

Output:

    Name  Age
0  Alice   23
1    Bob   34
2   Carl   45

Example 4 – Using Custom Separator for CSV File

You can specify a custom separator for the CSV file by adding the optional argument sep.

  • To use the tabular separator, use sep='\t' as an argument for pd.read_csv().
  • To use the semicolon separator, use sep=';' as an argument for pd.read_csv().
  • The default separator argument of pd.read_csv() is sep=','.
import pandas as pd
df = pd.read_csv('my_file.csv', sep='\t')
print(df)

If the tabular character was the separator in the file 'my_file.csv', this example will generate the same DataFrame as already shown in example 1:

    Name         Job  Age  Income
0  Alice  Programmer   23  110000
1    Bob   Executive   34   90000
2   Carl       Sales   45   50000

Example 5 – Read CSV with Custom Headers

Per default, the first row of the CSV file will be used as a header of the DataFrame by the df.read_csv() function.

To read a CSV to a DataFrame with custom headers, call pd.read_csv("my_file.csv", skiprows=1, names=column_names).

  • skiprows=1 ensures that you’ll skip the first row that contains the default column names. We’ll explore this in more detail below.
  • names=column_names passes the column names stored in the variable as a list ['Spouse', 'Former Job', 'Insta Followers', 'Insta Likes']. These are the new column names (=headers) of the newly-created DataFrame.

Here’s an example:

import pandas as pd
column_names = ['Spouse', 'Former Job', 'Insta Followers', 'Insta Likes']
df = pd.read_csv("my_file.csv", skiprows=1, names=column_names)
print(df)

Output:

  Spouse  Former Job  Insta Followers  Insta Likes
0  Alice  Programmer               23       110000
1    Bob   Executive               34        90000
2   Carl       Sales               45        50000

Example 6 – Skip First n Rows from CSV using Custom Headers

import pandas as pd

n = 2  # no. of rows to skip
column_names = ['Spouse', 'Former Job', 'Insta Followers', 'Insta Likes']
df = pd.read_csv("my_file.csv", skiprows=2, names=column_names)

print(df)

Output:

  Spouse Former Job  Insta Followers  Insta Likes
0    Bob  Executive               34        90000
1   Carl      Sales               45        50000

The first row (=original header) and the second row (=first line of CSV) are now skipped when setting skiprows=2.

Example 7 – Read CSV with Header in i-th Row

To read the CSV file with the header in the i-th row, you can pass the row index i as header argument like so: pd.read_csv("my_file.csv", header = i). The i-th row will now be taken as the column headers of the newly-created DataFrame.

The following example shows how to read a CSV to a DataFrame with the header in the second row with index i=1:

import pandas as pd

i = 1    # header row index
df = pd.read_csv("my_file.csv", header = i)

print(df)

Output:

  Alice Programmer  23  110000
0   Bob  Executive  34   90000
1  Carl      Sales  45   50000

In this example, the second row of the CSV is taken as a header. The first row is ignored altogether.

Example 8 – Read CSV Without Header

If you specify header=None when calling pd.read_csv("my_file.csv", header=None), Python will assign successive integer values (0, 1, 2, …) as column names.

import pandas as pd
df = pd.read_csv("my_file.csv", header=None)
print(df)

Output:

       0           1    2       3
0   Name         Job  Age  Income
1  Alice  Programmer   23  110000
2    Bob   Executive   34   90000
3   Carl       Sales   45   50000

Example 9 – Read CSV Custom Header Names: Prefix

If you specify header=None when calling pd.read_csv("my_file.csv", header=None), Python will assign successive integer values (0, 1, 2, …) as column names.

If you want to modfiy the column names in a batch, you can additionally specify the prefix argument, i.e., a prefix string that will be inserted before each successive integer column name.

For example, prefix='Col_' in combination with header=None will yield the column names:

  • 'Col_0'
  • 'Col_1'
  • 'Col_2'

Here’s the code specification:

import pandas as pd
df = pd.read_csv("my_file.csv", header=None, prefix='Col_')
print(df)

Output:

   Col_0       Col_1 Col_2   Col_3
0   Name         Job   Age  Income
1  Alice  Programmer    23  110000
2    Bob   Executive    34   90000
3   Carl       Sales    45   50000

Example 10 – Set Specific Values as NaN (Missing Value)

If you want to set specific values in the CSV as “Not a Number” (NaN), you can use the list argument na_values=[val_1, ...] that has an element for each value to be set as NaN when loading the CSV into a DataFrame.

Here’s an example setting the values 'Alice' and 23 to NaN.

import pandas as pd
df = pd.read_csv("my_file.csv", na_values=['Alice', 23])
print(df)

Output:

   Name         Job   Age  Income
0   NaN  Programmer   NaN  110000
1   Bob   Executive  34.0   90000
2  Carl       Sales  45.0   50000

The missing values are highlighted in bold.

Example 11 – Load CSV at URL to DataFrame

Given the string URL with the location of the desired CSV file (e.g., "https://www.my_url.com/my_file.csv"). You can load the contents into a DataFrame by passing the URL as first argument of the pd.read_csv() function instead of the local file path.

Pandas will automagially figure out that it’s an URL. 😉

import pandas as pd
df = pd.read_csv("https://www.my_url.com/my_file.csv")
print(df)

Output — assuming the CSV file actually exists:

    Name         Job  Age  Income
0  Alice  Programmer   23  110000
1    Bob   Executive   34   90000
2   Carl       Sales   45   50000

Example 12 – Skip Last i Rows Loading CSV to DataFrame

You can skip the last i rows by setting the skipfooter=i argument in the pd.read_csv() function. The first row of the CSV will still be the header used for the column names. But the last i rows are ignored from the CSV.

Here’s an example:

import pandas as pd
df = pd.read_csv("my_file.csv", skipfooter=2)
print(df)

Output:

    Name         Job  Age  Income
0  Alice  Programmer   23  110000

The first row is still taken as header. But the last i=2 rows are skipped, so the remaining DataFrame has only one row.

Example 13 – Ignoring Comment Lines When Loading CSV to DataFrame

Assume you have a CSV file with some comments (e.g., starting with the hashtag character '#').

'my_file.csv':

Name,Job,Age,Income
Alice,Programmer,23,110000
# Ignore this line
Bob,Executive,34,90000
Carl,Sales,45,50000
# This line is ignored too

You can load the CSV and ignore lines that start with a specific “commenting character” such as the symbol '#' by passing the argument comment='#' into the pd.read_csv() function.

Here’s an example:

import pandas as pd
df = pd.read_csv("my_file.csv", comment="#")
print(df)

The output is the DataFrame with commented lines ignored:

    Name         Job  Age  Income
0  Alice  Programmer   23  110000
1    Bob   Executive   34   90000
2   Carl       Sales   45   50000

Without setting the comment argument, the DataFrame would attempt to load commented lines which can yield a very odd behavior in practical code projects:

                         Name         Job   Age    Income
0                       Alice  Programmer  23.0  110000.0
1          # Ignore this line         NaN   NaN       NaN
2                         Bob   Executive  34.0   90000.0
3                        Carl       Sales  45.0   50000.0
4  # This line is ignored too         NaN   NaN       NaN

The highlighted lines are clearly not the desired behavior, so use the comment argument accordingly to fix this!

Example 14 – Measure Elapsed Runtime when Importing CSV to DataFrame

The verbose=True argument of the pd.read_csv() function allows you to measure the elapsed runtime when loading a CSV file.

For small CSV files, the elapsed runtime will always yield 0 due to the high speed efficiency. But for performance-heavy projects or large-scale CSV files, this argument can make your life much easier!

import pandas as pd
df = pd.read_csv("my_file.csv", verbose=True)
print(df)

Output:

Tokenization took: 0.00 ms
Type conversion took: 0.00 ms
Parser memory cleanup took: 0.00 ms
    Name         Job  Age  Income
0  Alice  Programmer   23  110000
1    Bob   Executive   34   90000
2   Carl       Sales   45   50000

In our case, the elapsed runtime was real quick due to the small CSV file. But for larger, real-world CSV files, this can give you a great idea on your code’s bottlenecks for performance analysis.

Example 15 – Pandas DataFrame to CSV

Given a Pandas DataFrame. How to save it in a CSV file?

You can convert a list of lists to a Pandas DataFrame that provides you with powerful capabilities such as the to_csv() method. This is the easiest method and it allows you to avoid importing yet another library (I use Pandas in many Python projects anyways).

salary = [['Alice', 'Data Scientist', 122000],
          ['Bob', 'Engineer', 77000],
          ['Ann', 'Manager', 119000]]

import pandas as pd
df = pd.DataFrame(salary)
df.to_csv('file2.csv', index=False, header=False)

Output:

# file2.csv
Alice,Data Scientist,122000
Bob,Engineer,77000
Ann,Manager,119000

🌍 Learn More: You can learn more about this approach in our detailed Finxter blog article.

Example 16 – CSV to DataFrame Setting Specific Values as True

You can set specific values as True when loading the CSV file to a DataFrame by defining the true_values argument. Simply pass a list of values to be used as True values in the pd.read_csv() function.

For example, true_values=['Alice', 'Bob'] will set each occurrence of either 'Alice' or 'Bob' as True in the DataFrame.

import pandas as pd
df = pd.read_csv("my_file.csv", true_values=['Alice', 'Bob'])
print(df)

Output:

    Name         Job  Age  Income
0  True  Programmer   23  110000
1    True   Executive   34   90000
2   Carl       Sales   45   50000

Example 17 – CSV to DataFrame Setting Specific Values as False

You can set specific values as False when loading the CSV file to a DataFrame by defining the false_values argument. Simply pass a list of values to be used as False values in the pd.read_csv() function.

For example, false_values=['Alice', 'Bob'] will set each occurrence of either 'Alice' or 'Bob' as False in the DataFrame.

import pandas as pd
df = pd.read_csv("my_file.csv", false_values=['Alice', 'Bob'])
print(df)

Output:

    Name         Job  Age  Income
0  False  Programmer   23  110000
1    False   Executive   34   90000
2   Carl       Sales   45   50000

Related Video

Read and Write Flat Files with Pandas

More Python CSV Conversions

🐍 Learn More: I have compiled an “ultimate guide” on the Finxter blog that shows you the best method, respectively, to convert a CSV file to JSON, Excel, dictionary, Parquet, list, list of lists, list of tuples, text file, DataFrame, XML, NumPy array, and list of dictionaries.

Programmer Humor

It’s hard to train deep learning algorithms when most of the positive feedback they get is sarcastic. — from xkcd

Resources