π¬ Question: How to import a CSV file to a Pandas DataFrame in Python?
This article will discuss the most interesting examples to read a CSV file to a Pandas DataFrame.
If not specified otherwise, we use the following CSV file for all examples:
my_file.csv
:
Name,Job,Age,Income
Alice,Programmer,23,110000
Bob,Executive,34,90000
Carl,Sales,45,50000
Let’s get started!
Example 1 – CSV to DataFrame
To import a given CSV file into a newly-created Pandas DataFrame, use the pd.read_csv('my_file.csv')
function that returns a DataFrame created with the content in the CSV file 'my_file.csv'
.
Here’s a quick and generic code snippet showcasing this approach:
import pandas as pd df = pd.read_csv('my_file.csv') print(df)
Output:
Name Job Age Income
0 Alice Programmer 23 110000
1 Bob Executive 34 90000
2 Carl Sales 45 50000
You convert the CSV file to a DataFrame in three steps: (1) import the pandas library, (2) use pd.read_csv()
and pass the filename as a string argument, and (3) print the resulting DataFrame (optional).
import pandas as pd
df = pd.read_csv('my_file.csv')
print(df)
Example 2 – Importing Specific Column from CSV
If you want to import a specific column in the CSV, use the argument usecols=['column_name']
.
The following example only reads the column 'Name'
using this approach:
import pandas as pd df = pd.read_csv('my_file.csv', usecols= ['Name']) print(df)
Output:
Name
0 Alice
1 Bob
2 Carl
Example 3 – Importing Subset of Columns from CSV
If you want to import a subset of columns from the CSV, use the argument usecols= ['column_name_1','column_name_2']
.
import pandas as pd df = pd.read_csv('my_file.csv', usecols= ['Name','Age']) print(df)
Output:
Name Age
0 Alice 23
1 Bob 34
2 Carl 45
Example 4 – Using Custom Separator for CSV File
You can specify a custom separator for the CSV file by adding the optional argument sep
.
- To use the tabular separator, use
sep='\t'
as an argument forpd.read_csv()
. - To use the semicolon separator, use
sep=';'
as an argument forpd.read_csv()
. - The default separator argument of
pd.read_csv()
issep=','
.
import pandas as pd df = pd.read_csv('my_file.csv', sep='\t') print(df)
If the tabular character was the separator in the file 'my_file.csv'
, this example will generate the same DataFrame as already shown in example 1:
Name Job Age Income
0 Alice Programmer 23 110000
1 Bob Executive 34 90000
2 Carl Sales 45 50000
Example 5 – Read CSV with Custom Headers
Per default, the first row of the CSV file will be used as a header of the DataFrame by the df.read_csv()
function.
To read a CSV to a DataFrame with custom headers, call pd.read_csv("my_file.csv", skiprows=1, names=column_names)
.
skiprows=1
ensures that you’ll skip the first row that contains the default column names. We’ll explore this in more detail below.names=column_names
passes the column names stored in the variable as a list['Spouse', 'Former Job', 'Insta Followers', 'Insta Likes']
. These are the new column names (=headers) of the newly-created DataFrame.
Here’s an example:
import pandas as pd column_names = ['Spouse', 'Former Job', 'Insta Followers', 'Insta Likes'] df = pd.read_csv("my_file.csv", skiprows=1, names=column_names) print(df)
Output:
Spouse Former Job Insta Followers Insta Likes
0 Alice Programmer 23 110000
1 Bob Executive 34 90000
2 Carl Sales 45 50000
Example 6 – Skip First n Rows from CSV using Custom Headers
import pandas as pd n = 2 # no. of rows to skip column_names = ['Spouse', 'Former Job', 'Insta Followers', 'Insta Likes'] df = pd.read_csv("my_file.csv", skiprows=2, names=column_names) print(df)
Output:
Spouse Former Job Insta Followers Insta Likes
0 Bob Executive 34 90000
1 Carl Sales 45 50000
The first row (=original header) and the second row (=first line of CSV) are now skipped when setting skiprows=2
.
Example 7 – Read CSV with Header in i-th Row
To read the CSV file with the header in the i
-th row, you can pass the row index i
as header argument like so: pd.read_csv("my_file.csv", header = i)
. The i
-th row will now be taken as the column headers of the newly-created DataFrame.
The following example shows how to read a CSV to a DataFrame with the header in the second row with index i=1
:
import pandas as pd i = 1 # header row index df = pd.read_csv("my_file.csv", header = i) print(df)
Output:
Alice Programmer 23 110000
0 Bob Executive 34 90000
1 Carl Sales 45 50000
In this example, the second row of the CSV is taken as a header. The first row is ignored altogether.
Example 8 – Read CSV Without Header
If you specify header=None
when calling pd.read_csv("my_file.csv", header=None)
, Python will assign successive integer values (0, 1, 2, …) as column names.
import pandas as pd df = pd.read_csv("my_file.csv", header=None) print(df)
Output:
0 1 2 3
0 Name Job Age Income
1 Alice Programmer 23 110000
2 Bob Executive 34 90000
3 Carl Sales 45 50000
Example 9 – Read CSV Custom Header Names: Prefix
If you specify header=None
when calling pd.read_csv("my_file.csv", header=None)
, Python will assign successive integer values (0, 1, 2, …) as column names.
If you want to modfiy the column names in a batch, you can additionally specify the prefix argument, i.e., a prefix string that will be inserted before each successive integer column name.
For example, prefix='Col_'
in combination with header=None
will yield the column names:
'Col_0'
'Col_1'
'Col_2'
- …
Here’s the code specification:
import pandas as pd df = pd.read_csv("my_file.csv", header=None, prefix='Col_') print(df)
Output:
Col_0 Col_1 Col_2 Col_3
0 Name Job Age Income
1 Alice Programmer 23 110000
2 Bob Executive 34 90000
3 Carl Sales 45 50000
Example 10 – Set Specific Values as NaN (Missing Value)
If you want to set specific values in the CSV as “Not a Number” (NaN
), you can use the list argument na_values=[val_1, ...]
that has an element for each value to be set as NaN
when loading the CSV into a DataFrame.
Here’s an example setting the values 'Alice'
and 23
to NaN
.
import pandas as pd df = pd.read_csv("my_file.csv", na_values=['Alice', 23]) print(df)
Output:
Name Job Age Income
0 NaN Programmer NaN 110000
1 Bob Executive 34.0 90000
2 Carl Sales 45.0 50000
The missing values are highlighted in bold.
Example 11 – Load CSV at URL to DataFrame
Given the string URL with the location of the desired CSV file (e.g., "https://www.my_url.com/my_file.csv"
). You can load the contents into a DataFrame by passing the URL as first argument of the pd.read_csv()
function instead of the local file path.
Pandas will automagially figure out that it’s an URL. π
import pandas as pd df = pd.read_csv("https://www.my_url.com/my_file.csv") print(df)
Output — assuming the CSV file actually exists:
Name Job Age Income
0 Alice Programmer 23 110000
1 Bob Executive 34 90000
2 Carl Sales 45 50000
Example 12 – Skip Last i Rows Loading CSV to DataFrame
You can skip the last i
rows by setting the skipfooter=i
argument in the pd.read_csv()
function. The first row of the CSV will still be the header used for the column names. But the last i
rows are ignored from the CSV.
Here’s an example:
import pandas as pd df = pd.read_csv("my_file.csv", skipfooter=2) print(df)
Output:
Name Job Age Income
0 Alice Programmer 23 110000
The first row is still taken as header. But the last i=2
rows are skipped, so the remaining DataFrame has only one row.
Example 13 – Ignoring Comment Lines When Loading CSV to DataFrame
Assume you have a CSV file with some comments (e.g., starting with the hashtag character '#'
).
'my_file.csv'
:
Name,Job,Age,Income
Alice,Programmer,23,110000
# Ignore this line
Bob,Executive,34,90000
Carl,Sales,45,50000
# This line is ignored too
You can load the CSV and ignore lines that start with a specific “commenting character” such as the symbol '#'
by passing the argument comment='#'
into the pd.read_csv()
function.
Here’s an example:
import pandas as pd df = pd.read_csv("my_file.csv", comment="#") print(df)
The output is the DataFrame with commented lines ignored:
Name Job Age Income
0 Alice Programmer 23 110000
1 Bob Executive 34 90000
2 Carl Sales 45 50000
Without setting the comment argument, the DataFrame would attempt to load commented lines which can yield a very odd behavior in practical code projects:
Name Job Age Income
0 Alice Programmer 23.0 110000.0
1 # Ignore this line NaN NaN NaN
2 Bob Executive 34.0 90000.0
3 Carl Sales 45.0 50000.0
4 # This line is ignored too NaN NaN NaN
The highlighted lines are clearly not the desired behavior, so use the comment
argument accordingly to fix this!
Example 14 – Measure Elapsed Runtime when Importing CSV to DataFrame
The verbose=True
argument of the pd.read_csv()
function allows you to measure the elapsed runtime when loading a CSV file.
For small CSV files, the elapsed runtime will always yield 0 due to the high speed efficiency. But for performance-heavy projects or large-scale CSV files, this argument can make your life much easier!
import pandas as pd df = pd.read_csv("my_file.csv", verbose=True) print(df)
Output:
Tokenization took: 0.00 ms
Type conversion took: 0.00 ms
Parser memory cleanup took: 0.00 ms
Name Job Age Income
0 Alice Programmer 23 110000
1 Bob Executive 34 90000
2 Carl Sales 45 50000
In our case, the elapsed runtime was real quick due to the small CSV file. But for larger, real-world CSV files, this can give you a great idea on your code’s bottlenecks for performance analysis.
Example 15 – Pandas DataFrame to CSV
Given a Pandas DataFrame. How to save it in a CSV file?
You can convert a list of lists to a Pandas DataFrame that provides you with powerful capabilities such as the to_csv()
method. This is the easiest method and it allows you to avoid importing yet another library (I use Pandas in many Python projects anyways).
salary = [['Alice', 'Data Scientist', 122000], ['Bob', 'Engineer', 77000], ['Ann', 'Manager', 119000]] import pandas as pd df = pd.DataFrame(salary) df.to_csv('file2.csv', index=False, header=False)
Output:
# file2.csv
Alice,Data Scientist,122000
Bob,Engineer,77000
Ann,Manager,119000
π Learn More: You can learn more about this approach in our detailed Finxter blog article.
Example 16 – CSV to DataFrame Setting Specific Values as True
You can set specific values as True
when loading the CSV file to a DataFrame by defining the true_values
argument. Simply pass a list of values to be used as True
values in the pd.read_csv()
function.
For example, true_values=['Alice', 'Bob']
will set each occurrence of either 'Alice'
or 'Bob'
as True
in the DataFrame.
import pandas as pd df = pd.read_csv("my_file.csv", true_values=['Alice', 'Bob']) print(df)
Output:
Name Job Age Income
0 True Programmer 23 110000
1 True Executive 34 90000
2 Carl Sales 45 50000
Example 17 – CSV to DataFrame Setting Specific Values as False
You can set specific values as False
when loading the CSV file to a DataFrame by defining the false_values
argument. Simply pass a list of values to be used as False
values in the pd.read_csv()
function.
For example, false_values=['Alice', 'Bob']
will set each occurrence of either 'Alice'
or 'Bob'
as False
in the DataFrame.
import pandas as pd df = pd.read_csv("my_file.csv", false_values=['Alice', 'Bob']) print(df)
Output:
Name Job Age Income
0 False Programmer 23 110000
1 False Executive 34 90000
2 Carl Sales 45 50000
Related Video
More Python CSV Conversions
π Learn More: I have compiled an “ultimate guide” on the Finxter blog that shows you the best method, respectively, to convert a CSV file to JSON, Excel, dictionary, Parquet, list, list of lists, list of tuples, text file, DataFrame, XML, NumPy array, and list of dictionaries.