How to Create a DataFrame in Pandas?

5/5 - (1 vote)

In Python’s pandas module, DataFrames are two-dimensional data objects. You can think of them as tables with rows and columns that contain data. This article provides an overview of the most common ways to instantiate DataFrames. We follow the convention to rename the pandas import to pd.

Photo by Erol Ahmed on Unsplash

Create a DataFrame From a CSV File

Creating DataFrames with the function pd.read_csv(filename) is probably the best known.
The first line of the csv file contains the column labels separated by commas.
In the following lines follow the data points, in each row as many as there are columns.
The data points must be separated by commas, if you want to use the default settings of pd.read_csv().
Here is an example of such a csv file:

# data.csv

column1, column2, column3
value00, value01, value02
value10, value11, value12
value20, value21, value22

The following code snippet creates a DataFrame from the data.csv file:

import pandas as pd

df = pd.read_csv('data.csv')

The function pd.read_table() is similar but expects tabs as delimiters instead of comas.
The default behavior of pandas adds an integer row index, yet it is also possible to choose one of the data columns to become the index column.
To do so, use the parameter index_col. Example: pd.read_csv(‘data.csv’, index_col=0)

Create a DataFrame From a List of Lists

A DataFrame can be created from a list of lists where each list in the outer list contains the data for one row.
To create the DataFrame we use the DataFrame’s constructor to which we pass the list of list and a list with the column labels:

import pandas as pd

data = [
     ['Bob', 23],
     ['Carl', 34],
     ['Dan', 14]
df = pd.DataFrame(data, columns=['Name', 'Age'])

Create a DataFrame From a Dictionary of Lists

A DataFrame can be created from a dictionary of lists. The dictionary’s keys are the column labels, the lists contain the data for the columns.

import pandas as pd

# columns
names = ['Alice', 'Bob', 'Carl']
ages = [21, 27, 35]

# create the dictionary of lists
data = {'Name':names, 'Age':ages}

df = pd.DataFrame(data)

Create a DataFrame From a List of Dictionaries

A DataFrame can be created from a list of dictionaries. Each dictionary represents a row in the DataFrame. The keys in the dictionaries are the column labels and the values are the values for the columns.

data = [
         {'Car':'Mercedes', 'Driver':'Hamilton, Lewis'},
         {'Car':'Ferrari', 'Driver':'Schumacher, Michael'},
         {'Car':'Lamborghini', 'Driver':'Rossi, Semino'}

Create a DataFrame From a List of Tuples

The DataFrame constructor can also be called with a list of tuples where each tuple represents a row in the DataFrame. In addition we pass a list of column labels to the parameter columns.

import pandas as pd

names = ['Alice', 'Bob', 'Clarisse', 'Dagobert']
ages = [20, 53, 42, 23]

# create a list of tuples
data = list(zip(names, ages))

df = pd.DataFrame(data, columns=['Name', 'Age'])

Summing Up

In this article we have gone through a range of different ways to create DataFrames in pandas. However, it is not exhaustive.
You should choose the method which best fits your use-case, this is to say, the method which requires the least amount of data transformation.