Pandas is a great library for data analysis in Python. With Pandas, you can create visualizations, filter rows or columns, add new columns, and save the data in a wide range of formats. The workhorse of Pandas is the DataFrame.
👉 Recommended: 10 Minutes to Pandas (in 5 Minutes)
So the first step working with Pandas is often to get our data into a DataFrame. If we have data stored in lists, how can we create this all-powerful DataFrame?
There are 4 basic strategies:
- Create a dictionary with column names as keys and your lists as values. Pass this dictionary as an argument when creating the DataFrame.
- Pass your lists into the
zip()
function. As with strategy 1, your lists will become columns in the DataFrame. - Put your lists into a list instead of a dictionary. In this case, your lists become rows instead of columns.
- Create an empty DataFrame and add columns one by one.
Method 1: Create a DataFrame using a Dictionary

The first step is to import pandas. If you haven’t already, install pandas first.
import pandas as pd
Let’s say you have employee data stored as lists.
# if your data is stored like this employee = ['Betty', 'Veronica', 'Archie', 'Jughead'] salary = [110_000, 20_000, 80_000, 70_000] bonus = [1000, 500, 2500, 400] tax_rate = [.1, .25, .17, .4] absences = [0, 1, 0, 52]
Build a dictionary using column names as keys and your lists as values.
# you can easily create a dictionary that will define your dataframe emp_data = { 'name': employee, 'salary': salary, 'bonus': bonus, 'tax_rate': tax_rate, 'absences': absences }
Your lists will become columns in the resulting DataFrame.

Create a DataFrame using the zip function

Pass each list as a separate argument to the zip()
function. You can specify the column names using the columns
parameter or by setting the columns
property on a separate line.
emp_df = pd.DataFrame(zip(employee, salary, bonus, tax_rate, absences)) emp_df.columns = ['name', 'salary', 'bonus', 'tax_rate', 'absences']
The zip()
function creates an iterator. For the first iteration, it grabs every value at index 0 from each list. This becomes the first row in the DataFrame. Next, it grabs every value at index 1 and this becomes the second row. This continues until it exhausts the shortest list.
We can loop thru the iterator to see how this works.
i = 0 for value in zip(employee, salary, bonus, tax_rate, absences): print(f'zipped value at index {i}: {value}') i += 1
Each of these values becomes a row in the DataFrame:
zipped value at index 0: ('Betty', 110000, 1000, 0.1, 0)
zipped value at index 1: ('Veronica', 20000, 500, 0.25, 1)
zipped value at index 2: ('Archie', 80000, 2500, 0.17, 0)
zipped value at index 3: ('Jughead', 70000, 400, 0.4, 52)
Create a DataFrame using a list of lists
What if you have a separate list for each employee? In this case, we can just create a list of lists. Each of the inner lists becomes a row in the DataFrame.
# lists for employees instead of features betty = ['Betty', 110000, 1000, 0.1, 0] veronica = ['Veronica', 20000, 500, 0.25, 1] archie = ['Archie', 80000, 2500, 0.17, 0] jughead = ['Jughead', 70000, 400, 0.4, 52] emp_df = pd.DataFrame([betty, veronica, archie, jughead]) emp_df.columns = ['name', 'salary', 'bonus', 'tax_rate', 'absences'] emp_df

Create a DataFrame using a list of dictionaries

If the employee data is stored in dictionaries instead of lists, we use a list of dictionaries.
betty = {'name': 'Betty', 'salary': 110000, 'bonus': 1000, 'tax_rate': 0.1, 'absences': 0} veronica = {'name': 'Veronica', 'salary': 20000, 'bonus': 500, 'tax_rate': 0.25, 'absences': 1} archie = {'name': 'Archie', 'salary': 80000, 'bonus': 2500, 'tax_rate': 0.17, 'absences': 0} jughead = {'name': 'Jughead', 'salary': 70000, 'bonus': 400, 'tax_rate': 0.4, 'absences': 52} pd.DataFrame([betty, veronica, archie, jughead])

The columns are determined by the keys in the dictionaries. What if the dictionaries don’t all have the same keys?
betty = {'name': 'Betty', 'salary': 110000, 'bonus': 1000, 'tax_rate': 0.1, 'absences': 0, 'hire_date': '2001-01-01'} veronica = {'name': 'Veronica', 'salary': 20000, 'bonus': 500, 'tax_rate': 0.25, 'absences': 1} archie = {'name': 'Archie', 'salary': 80000, 'bonus': 2500, 'tax_rate': 0.17, 'absences': 0, 'title': 'Vice Chief Leader'} jughead = {'name': 'Jughead', 'salary': 70000, 'bonus': 400, 'tax_rate': 0.4, 'absences': 52, 'rank': 'yes'} pd.DataFrame([betty, veronica, archie, jughead])

All of the keys will be used. Anytime pandas encounters a dictionary with a missing key, the missing value will be replaced with NaN which stands for ‘not a number’.
Create an empty DataFrame and add columns one by one
This method might be preferable if you needed to create a lot of new calculated columns. Here we create a new column for after-tax income.
emp_df = pd.DataFrame() emp_df['name'] = employee emp_df['salary'] = salary emp_df['bonus'] = bonus emp_df['tax_rate'] = tax_rate emp_df['absences'] = absences income = emp_df['salary'] + emp_df['bonus'] emp_df['after_tax'] = income * (1 - emp_df['tax_rate'])
How to add a list to an existing DataFrame
Here is a neat trick. If you want to edit a row in a DataFrame you can use the handy loc
method. Loc allows you to access rows and columns by their index value.
To access a row:
emp_df.loc[3]
Output is the row with index value 3 as a Series:
name Jughead
salary 70000
bonus 400
tax_rate 0.4
absences 52
Name: 3, dtype: object
To access a column just pass in the column name as the index. Note that we have to specify the row and column indexes. The format is [rows, columns]
. If you want all rows you can use “:
” as we do here. The :
also works if you want all columns.
emp_df.loc[:, 'salary']
Output is also a series
0 110000 1 20000 2 80000 3 70000 4 200000 Name: salary, dtype: int64
So how do we use loc
to add a new row? If we use a row index that doesn’t exist in the DataFrame, it will create a new row for us.
new_emp = ['Fonzie', 200000, 30000, .05, 112] emp_df.loc[4] = new_emp emp_df

You can also update existing data with loc
. Let’s drop Fonzie’s salary. It looks a bit excessive.
emp_df.loc[4, 'salary'] = 105000 emp_df

That’s more like it.
Conclusion
There are many different ways of creating a DataFrame. We looked at several methods using data stored in lists. Each will get the job done.
The most convenient method will depend on what your lists represent.
If each of your lists would best be represented as a column, then a dictionary of lists might be the easiest way to go.
If each of your lists would best be represented as a row, then a list of lists would be a good choice.
To add data in a list as a new row in an existing DataFrame, the loc
method comes in handy. Loc is also useful for updating existing data.