In Python’s pandas module, DataFrames are two-dimensional data objects. You can think of them as tables with rows and columns that contain data.
This article provides an overview of the most common ways to instantiate DataFrames.
π‘ Note: We follow the convention to rename the pandas import to pd
.
import pandas as pd
Create a DataFrame From a CSV File
Creating DataFrames with the function pd.read_csv(filename)
is probably the best known.
The first line of the CSV file contains the column labels separated by commas.
In the following lines follow the data points, in each row as many as there are columns.
The data points must be separated by commas, if you want to use the default settings of pd.read_csv()
.
Here is an example of such a CSV file:
# data.csv column1, column2, column3 value00, value01, value02 value10, value11, value12 value20, value21, value22
The following code snippet creates a DataFrame from the data.csv
file:
import pandas as pd df = pd.read_csv('data.csv')
The function pd.read_table()
is similar but expects tabs as delimiters instead of commas.
The default behavior of pandas adds an integer row index, yet it is also possible to choose one of the data columns to become the index column.
To do so, use the parameter index_col
. Example: pd.read_csv('data.csv', index_col=0)
π Recommended Tutorial: How to Return a DataFrame from a Function?
Create a DataFrame From a List of Lists
A DataFrame can be created from a list of lists where each list in the outer list contains the data for one row.
To create the DataFrame, we use the DataFrame’s constructor to which we pass the list of list and a list with the column labels:
import pandas as pd data = [ ['Bob', 23], ['Carl', 34], ['Dan', 14] ] df = pd.DataFrame(data, columns=['Name', 'Age'])
π Recommended Tutorial: Python List of Lists to DataFrame
Create a DataFrame From a Dictionary of Lists
A DataFrame can be created from a dictionary of lists. The dictionary’s keys are the column labels, the lists contain the data for the columns.
import pandas as pd # columns names = ['Alice', 'Bob', 'Carl'] ages = [21, 27, 35] # create the dictionary of lists data = {'Name':names, 'Age':ages} df = pd.DataFrame(data)
π Recommended Tutorial: Python List of Dicts to DataFrame
Create a DataFrame From a List of Dictionaries
A DataFrame can be created from a list of dictionaries. Each dictionary represents a row in the DataFrame. The keys in the dictionaries are the column labels and the values are the values for the columns.
data = [ {'Car':'Mercedes', 'Driver':'Hamilton, Lewis'}, {'Car':'Ferrari', 'Driver':'Schumacher, Michael'}, {'Car':'Lamborghini', 'Driver':'Rossi, Semino'} ]
Create a DataFrame From a List of Tuples
The DataFrame constructor can also be called with a list of tuples where each tuple represents a row in the DataFrame. In addition, we pass a list of column labels to the parameter columns.
import pandas as pd names = ['Alice', 'Bob', 'Clarisse', 'Dagobert'] ages = [20, 53, 42, 23] # create a list of tuples data = list(zip(names, ages)) df = pd.DataFrame(data, columns=['Name', 'Age'])
Summing Up
In this article, we have gone through a range of different ways to create DataFrames in pandas. However, it is not exhaustive.
You should choose the method which best fits your use case, this is to say, the method which requires the least amount of data transformation.