The data we use in Pandas for real-world projects often comes from outside sources. One popular and repeatedly used source is Excel. In this tutorial, we will learn how to read files from Excel, how to modify these files, and how to write them to Excel. We will use several Pandas functions for these approaches.
As you go over this article, you can play the article video for further guidance:
Reading Excel sheets into a Pandas data frame
To read data into a Pandas data frame from an Excel sheet, we use the Pandas read_excel()
function. This function provides us with a wide range of parameters to read in our data in various ways. To get an overview of all the different parameters, have a look at the official documentation.
Letβs start with a simple example. The Excel data we are using looks like this:
The dataset contains fictional persons with their respective name, gender, job, and age. As we can see, there are no additional indexes, just the ones Excel provides us with (the row numbers on the left). Now we move over to our code editor to read in the dataset there:
import pandas as pd df = pd.read_excel("excel_file.xlsx")
The DataFrame looks like this:
name | gender | job | age | |
0 | Alice | f | manager | 31 |
1 | John | m | software engineer | 28 |
2 | Bob | m | farmer | 43 |
3 | Tracy | f | doctor | 40 |
First of all, we import the Pandas library. Then we create a data frame using the function read_excel()
and assign it to the variable βdf
β. Inside the function, we put in the name of the Excel file as a string. Finally, we output the data frame. Note that to make it work like this, the Pandas script and the Excel file have to be in the same folder. If this is not the case, we have to put in the file path where the Excel file is stored.
As mentioned earlier, we have lots of parameters we can use inside the read_excel()
function. One of them is the βdtype
β parameter which allows us to specify the data type of a column when it gets imported. For example, we could specify the data type of the βage
β column to be βfloat
β. We do it this way:
df = pd.read_excel("excel_file.xlsx", dtype={"age": float})
This results in the following:
name | gender | job | age | |
0 | Alice | f | manager | 31.0 |
1 | John | m | software engineer | 28.0 |
2 | Bob | m | farmer | 43.0 |
3 | Tracy | f | doctor | 40.0 |
We assign the βdtypeβ parameter a dictionary of all the column data types, we want to specify. We pass in the columnβs name as the key and the desired data type as the value of the dictionary. As we can see in the output, the data type indeed changed to float because all the βageβ values now have a β.0β at the end. To be 100% sure that the data type is βfloatβ, we can check it this way:
>>> df["age"].dtype dtype('float64')
The data type of the βage
β column is βfloat64
β.
The Excel file contains two tables, namely βTable 1β and βTable 2β.
By default, the read_excel()
function chooses the first table which contained the dataset we have seen above. Using the βsheet_name
β parameter, we can specify which table we want to choose. So now, letβs create a second data frame with the dataset from βTable 2β. This is what the dataset looks like in the Excel file:
Now we read in another data frame applying the βsheet_name
β parameter:
df2 = pd.read_excel("excel_file.xlsx", sheet_name="Table 2")
Unnamed: 0 | animal | food | habitat | |
0 | 1 | dog | dog food | land |
1 | 2 | sea turtle | seaweeds | sea |
2 | 3 | dolphin | fish | sea |
3 | 4 | eagle | mouse | sky |
We assign the name of the table we want to access as a string to the βsheet_name
β parameter. But what is that second column? Pandas created an additional column called βUnnamed: 0β. Note that this datasetβs first column in the Excel file has its own indexes, unlike the first dataset which did not have its own indexes. We can fix this using the βindex_col
β parameter. Here we specify the column which to use for the row labels. We specify this parameter when our dataset has its own indexes:
df2 = pd.read_excel("excel_file.xlsx", sheet_name="Table 2", index_col=0)
animal | food | habitat | |
1 | dog | dog food | land |
2 | sea turtle | seaweeds | sea |
3 | dolphin | fish | sea |
4 | eagle | mouse | sky |
We apply β0β to the βindex_colβ parameter to tell the program we want to use the first row of the dataset as the indexes for the data frame.
Modifying the Imported Files with Pandas
After we imported the Excel sheets and created new data frames from them, we can analyze and modify our data frames with Pandas. This step is not essential for this tutorial. However, we will take this step anyway just to give you an understanding of the whole workflow of reading in an Excel sheet, modifying the data, and writing the modified data back to an Excel sheet.
There are endless opportunities to perform here, but we will keep it simple and change one thing only. Letβs take the data frame we used in the first place:
This is how our basic DataFrame, named df
, looks:
name | gender | job | age | |
0 | Alice | f | manager | 31.0 |
1 | John | m | software engineer | 28.0 |
2 | Bob | m | farmer | 43.0 |
3 | Tracy | f | doctor | 40.0 |
Now, we modify the data frame and delete the βgenderβ column:
df3 = df.drop(columns=["gender"])
name | job | age | |
0 | Alice | manager | 31.0 |
1 | John | software engineer | 28.0 |
2 | Bob | farmer | 43.0 |
3 | Tracy | doctor | 40.0 |
We got rid of the βgender
β column by applying the drop()
function. Inside the function, we assigned the column βgender
β as a list to the βcolumns
β parameter to state which column we want to drop. The outputted data frame shows the initial data frame without the βgender
β column.
Now we changed the data frame a little bit. In the next step, we will see how to write this new data frame to an Excel sheet.
Writing Objects to an Excel Sheet
When we are done with our data analysis in Pandas and modified our data frames, we write these data frames to an Excel sheet to make them better available and visible. This is where the to_excel() function comes into play. For an overview of all the parameters, have a look at the official documentation. The approach is very straightforward:
df3.to_excel("new_excel_file.xlsx")
We choose the data frame that we want to export to an Excel sheet. We are using βdf3
β which we created in the last paragraph. Inside the to_excel()
function, we put in the name of the file (βnew_excel_file
β) and the file extension (β.xlsx
β) as a string. Note that we do not get an output here. Instead, a new file gets created and it is stored in the same folder as the Pandas script. This is what the file looks like:
As we can see, there are two different types of indexes here: The indexes in column βAβ and the Excel indexes on the left of column βAβ. The extra indexes in column βAβ are redundant and we want to get rid of them, so we modify the to_excel()
function a little bit by using the βindexβ parameter:
df3.to_excel("new_excel_file.xlsx", index=False)
We assigned the value βFalse
β to the βindex
β parameter to state that we do not want to use the Pandasβ indexes. The Excel file now looks like this:
In addition to that, we can specify the sheet name by applying the βsheet_name
β parameter:
df3.to_excel("new_excel_file.xlsx", index=False, sheet_name="Table 1")
We call the sheet βTable 1β. At the bottom of the file, we can see the new sheet name:
Writing Multiple Objects to Excel Files
We have seen how to write a data frame to an Excel file using the to_excel()
function. Using this method allows us to write one object only, but we might want to write multiple objects to Excel. Therefore, we use an ExcelWriter
object. ExcelWriter
is a class for writing data frames into Excel sheets. It is necessary for writing multiple data frames into Excel sheets and modifying existing Excel sheets (official docs). Letβs start with an example of two data frames being written into a new Excel sheet. We are using the data frames βdf2
β and βdf3
β from above:
with pd.ExcelWriter("new_file_2.xlsx") as writer: df2.to_excel(writer, sheet_name="Table 1", index=False) df3.to_excel(writer, sheet_name="Table 2", index=False)
The with
statement is a Python statement used for exception handling and makes our code easier to read. The code here says, in the first line we create a new Excel file (βnew_file_2.xlsx
β) using the ExcelWriter and assign it to a variable called βwriter
β. In the inserted two lines afterward, we write the two data frames into the newly created Excel file using the to_excel()
function that we already know. Inside the to_excel()
function we respectively put in the variable βwriter
β as the path. We also use the βsheet_name
β parameter and the respective name of the sheet and set the βindex
β parameter to βFalse
β to get rid of the extra indexes. This is what the first sheet in the new file looks like in Excel:
And this is the second sheet:
The ExcelWriter also allows us to append to an existing Excel file. We will see how to do this by appending the data frame βdf
β we used at the very beginning to the existing file βnew_file_2.xslx
β in a new sheet βTableΒ 3β:
with pd.ExcelWriter("new_file_2.xlsx", mode="a") as writer: df.to_excel(writer, sheet_name="Table 3", index=False)
There is only one real difference to the code from the example before. We only add the parameter βmode
β inside the ExcelWriter
and assign it the string "a"
which stands for βappendβ. So here, we are appending the data frame βdf
β to the existing Excel file and store it in a new sheet called βTable 3β. This is what the data frame in sheet 3 looks like:
And as you can see, we indeed have three sheets within this file:
Summary
In summary, it can be said that working properly with Excel files is very essential when using Pandas. The datasets we want to analyze often come from Excel sheets and when we are done analyzing, the modified datasets often have to be written back to Excel sheets to make the data more accessible. Using the shown Excel functions that Pandas provides us with make these processes pretty straightforward for us and they help us to provide a good and clear workflow when working with Excel files.
For more tutorials about Pandas, Python libraries, Python in general, or other computer science-related topics, check out the Finxter Blog page and check out the free Finxter email academy:
Happy Coding!