How to Read and Write Excel files with Pandas - Be on the Right Side of Change

The data we use in Pandas for real-world projects often comes from outside sources. One popular and repeatedly used source is Excel. In this tutorial, we will learn how to read files from Excel, how to modify these files, and how to write them to Excel. We will use several Pandas functions for these approaches.

As you go over this article, you can play the article video for further guidance:

Reading Excel sheets into a Pandas data frame

To read data into a Pandas data frame from an Excel sheet, we use the Pandas read_excel() function. This function provides us with a wide range of parameters to read in our data in various ways. To get an overview of all the different parameters, have a look at the official documentation.

Let’s start with a simple example. The Excel data we are using looks like this:

The dataset contains fictional persons with their respective name, gender, job, and age. As we can see, there are no additional indexes, just the ones Excel provides us with (the row numbers on the left). Now we move over to our code editor to read in the dataset there:

import pandas as pd

df = pd.read_excel("excel_file.xlsx")

The DataFrame looks like this:

	name	gender	job	age
0	Alice	f	manager	31
1	John	m	software engineer	28
2	Bob	m	farmer	43
3	Tracy	f	doctor	40

First of all, we import the Pandas library. Then we create a data frame using the function read_excel() and assign it to the variable “df”. Inside the function, we put in the name of the Excel file as a string. Finally, we output the data frame. Note that to make it work like this, the Pandas script and the Excel file have to be in the same folder. If this is not the case, we have to put in the file path where the Excel file is stored.

As mentioned earlier, we have lots of parameters we can use inside the read_excel() function. One of them is the “dtype” parameter which allows us to specify the data type of a column when it gets imported. For example, we could specify the data type of the “age” column to be “float”. We do it this way:

df = pd.read_excel("excel_file.xlsx", dtype={"age": float})

This results in the following:

	name	gender	job	age
0	Alice	f	manager	31.0
1	John	m	software engineer	28.0
2	Bob	m	farmer	43.0
3	Tracy	f	doctor	40.0

We assign the “dtype” parameter a dictionary of all the column data types, we want to specify. We pass in the column’s name as the key and the desired data type as the value of the dictionary. As we can see in the output, the data type indeed changed to float because all the “age” values now have a “.0” at the end. To be 100% sure that the data type is “float”, we can check it this way:

>>> df["age"].dtype
dtype('float64')

The data type of the “age” column is “float64”.

The Excel file contains two tables, namely “Table 1” and “Table 2”.

By default, the read_excel() function chooses the first table which contained the dataset we have seen above. Using the “sheet_name” parameter, we can specify which table we want to choose. So now, let’s create a second data frame with the dataset from “Table 2”. This is what the dataset looks like in the Excel file:

Now we read in another data frame applying the “sheet_name” parameter:

df2 = pd.read_excel("excel_file.xlsx", sheet_name="Table 2")

	Unnamed: 0	animal	food	habitat
0	1	dog	dog food	land
1	2	sea turtle	seaweeds	sea
2	3	dolphin	fish	sea
3	4	eagle	mouse	sky

We assign the name of the table we want to access as a string to the “sheet_name” parameter. But what is that second column? Pandas created an additional column called “Unnamed: 0”. Note that this dataset’s first column in the Excel file has its own indexes, unlike the first dataset which did not have its own indexes. We can fix this using the “index_col” parameter. Here we specify the column which to use for the row labels. We specify this parameter when our dataset has its own indexes:

df2 = pd.read_excel("excel_file.xlsx", sheet_name="Table 2", 
                    index_col=0)

	animal	food	habitat
1	dog	dog food	land
2	sea turtle	seaweeds	sea
3	dolphin	fish	sea
4	eagle	mouse	sky

We apply “0” to the “index_col” parameter to tell the program we want to use the first row of the dataset as the indexes for the data frame.

Modifying the Imported Files with Pandas

After we imported the Excel sheets and created new data frames from them, we can analyze and modify our data frames with Pandas. This step is not essential for this tutorial. However, we will take this step anyway just to give you an understanding of the whole workflow of reading in an Excel sheet, modifying the data, and writing the modified data back to an Excel sheet.

There are endless opportunities to perform here, but we will keep it simple and change one thing only. Let’s take the data frame we used in the first place:

This is how our basic DataFrame, named df, looks:

	name	gender	job	age
0	Alice	f	manager	31.0
1	John	m	software engineer	28.0
2	Bob	m	farmer	43.0
3	Tracy	f	doctor	40.0

Now, we modify the data frame and delete the “gender” column:

df3 = df.drop(columns=["gender"])

	name	job	age
0	Alice	manager	31.0
1	John	software engineer	28.0
2	Bob	farmer	43.0
3	Tracy	doctor	40.0

We got rid of the “gender” column by applying the drop() function. Inside the function, we assigned the column “gender” as a list to the “columns” parameter to state which column we want to drop. The outputted data frame shows the initial data frame without the “gender” column.

Now we changed the data frame a little bit. In the next step, we will see how to write this new data frame to an Excel sheet.

Writing Objects to an Excel Sheet

When we are done with our data analysis in Pandas and modified our data frames, we write these data frames to an Excel sheet to make them better available and visible. This is where the to_excel() function comes into play. For an overview of all the parameters, have a look at the official documentation. The approach is very straightforward:

df3.to_excel("new_excel_file.xlsx")

We choose the data frame that we want to export to an Excel sheet. We are using “df3” which we created in the last paragraph. Inside the to_excel() function, we put in the name of the file (“new_excel_file”) and the file extension (“.xlsx”) as a string. Note that we do not get an output here. Instead, a new file gets created and it is stored in the same folder as the Pandas script. This is what the file looks like:

As we can see, there are two different types of indexes here: The indexes in column “A” and the Excel indexes on the left of column “A”. The extra indexes in column “A” are redundant and we want to get rid of them, so we modify the to_excel() function a little bit by using the “index” parameter:

df3.to_excel("new_excel_file.xlsx", index=False)

We assigned the value “False” to the “index” parameter to state that we do not want to use the Pandas’ indexes. The Excel file now looks like this:

In addition to that, we can specify the sheet name by applying the “sheet_name” parameter:

df3.to_excel("new_excel_file.xlsx", index=False, sheet_name="Table 1")

We call the sheet “Table 1”. At the bottom of the file, we can see the new sheet name:

Writing Multiple Objects to Excel Files

We have seen how to write a data frame to an Excel file using the to_excel() function. Using this method allows us to write one object only, but we might want to write multiple objects to Excel. Therefore, we use an ExcelWriter object. ExcelWriter is a class for writing data frames into Excel sheets. It is necessary for writing multiple data frames into Excel sheets and modifying existing Excel sheets (official docs). Let’s start with an example of two data frames being written into a new Excel sheet. We are using the data frames “df2” and “df3” from above:

with pd.ExcelWriter("new_file_2.xlsx") as writer:
    df2.to_excel(writer, sheet_name="Table 1", index=False)
    df3.to_excel(writer, sheet_name="Table 2", index=False)

The with statement is a Python statement used for exception handling and makes our code easier to read. The code here says, in the first line we create a new Excel file (“new_file_2.xlsx”) using the ExcelWriter and assign it to a variable called “writer”. In the inserted two lines afterward, we write the two data frames into the newly created Excel file using the to_excel() function that we already know. Inside the to_excel() function we respectively put in the variable “writer” as the path. We also use the “sheet_name” parameter and the respective name of the sheet and set the “index” parameter to “False” to get rid of the extra indexes. This is what the first sheet in the new file looks like in Excel:

And this is the second sheet:

The ExcelWriter also allows us to append to an existing Excel file. We will see how to do this by appending the data frame “df” we used at the very beginning to the existing file “new_file_2.xslx” in a new sheet “Table 3”:

with pd.ExcelWriter("new_file_2.xlsx", mode="a") as writer:
    df.to_excel(writer, sheet_name="Table 3", index=False)

There is only one real difference to the code from the example before. We only add the parameter “mode” inside the ExcelWriter and assign it the string "a" which stands for “append”. So here, we are appending the data frame “df” to the existing Excel file and store it in a new sheet called “Table 3”. This is what the data frame in sheet 3 looks like:

And as you can see, we indeed have three sheets within this file:

Summary

In summary, it can be said that working properly with Excel files is very essential when using Pandas. The datasets we want to analyze often come from Excel sheets and when we are done analyzing, the modified datasets often have to be written back to Excel sheets to make the data more accessible. Using the shown Excel functions that Pandas provides us with make these processes pretty straightforward for us and they help us to provide a good and clear workflow when working with Excel files.

For more tutorials about Pandas, Python libraries, Python in general, or other computer science-related topics, check out the Finxter Blog page and check out the free Finxter email academy:

Happy Coding!