Microsoft Excel is a cross-platform and battle-tested spreadsheet software widely used for data analysis and visualization. It is a powerful and user-friendly tool indeed! But how can we bring it to the next level?
We can combine Excel with Python to
- read data from Excel,
- save data to Excel with formats, formulas, and even charts, and
- automate Excel tasks in Python!
Please continue reading and stay tuned for my Excel in Python series if it sounds great!
This tutorial is all about saving data to Excel.
Concretely, I will first introduce Excel’s data structure and lingos.
Then, you will learn the difference between four popular Python ways to save data to excel, including pandas
and openpyxl
.
Finally, I will focus on the pandas.DataFrame.to_excel()
method. I will guide you through four actual use cases in pandas.DataFrame.to_excel()
, ranging from one Excel worksheet, multiple Excel worksheets, multiple Excel workbooks, and dealing with index cases.
You can find all datasets and codes in this tutorial here. The data in our example datasets are not actual data and are only used for educational purposes.
Quick Introduction to Excel
Before jumping into saving data to Excel, let’s look at how data is stored in Excel and get ourselves familiar with some Excel lingos.
Simply put, data is stored in cells in Excel, and each cell can be identified with its unique row and column number pair.
Columns in Excel are labeled in alphabets, starting from ‘A
’, and rows in Excel are labeled in roman numbers, starting from ‘1
’. For example, in the following Excel picture, A1
is the cell in the intersection of the first column and first row, ‘ID’.
Yes, you got the idea! It is just like the DataFrame data structure in our friends, pandas.
And if you want to select an area to dump data into Excel, you can imagine drawing a rectangle data region based on the upper-left cell of the part.
On top of that, other commonly seen Excel lingos include worksheets and workbooks.
- A worksheet means a single spreadsheet in an Excel file.
- A workbook means a single Excel file with extensions like
.xlsx
and.xls
.
Tip: More file extensions supported by Excel can be found in the Microsoft official doc here.
Now, you are ready to know how to save data to Excel!
Popular Python ways to Save Data to Excel
Here is the summary of popular Python ways to save data to excel:
One of pandas.DataFrame.to_excel()
’s advantages is engine compatibility. It uses multiple external engines for different Excel extensions, including openpyxl
, xlwt
, and xlsxwriter
.
- “
xlwt
” supports old-style Microsoft Excel versions 95 to 2003 files (.xls
). - “
openpyxl
” supports newer Excel 2010 file formats (.xlsx, .xlsm, .xltx, .xltm
). - “
xlsxwriter
” supports Excel 2007+ file format (.xlsx
).
In summary, you can use openpyxl
and other libraries for specific Excel file formats and basic data processing. And please remember that openpyxl
and xlsxwriter
support newer Excel file formats.
However, I recommend using pandas.DataFrame.to_excel()
for data science and analytics applications because it supports most Excel file formats and accepts a DataFrame object with powerful methods.
Therefore, let’s see how to use pandas.DataFrame.to_excel()
to save data to Excel!
Saving Data to Excel – pandas.DataFrame.to_excel()
Since pandas.DataFrame.to_excel()
is the most powerful and inclusive way to save data from Excel files with different extensions, I will first introduce its syntax and walk you through three use cases with real Python codes below using pandas.DataFrame.to_excel()
.
Meet pandas.ExcelWriter
Before heading over to the pandas.DataFrame.to_excel
method, we need to know a new friend, pandas.ExcelWriter
. It is a class for writing pandas.DataFrame
objects into excel sheets.
When you are trying to write to multiple sheets, you need to create an ExcelWriter
object and pass it to pandas.DataFrame.to_excel()
as the first parameter.
To create an ExcelWriter
object, we pass something to it according to its syntax:
# Syntax for pandas.ExcelWriter pandas.ExcelWriter(path[, engine=None[, date_format=None[, datetime_format=None[, mode='w'[, storage_options=None[, if_sheet_exists=None[, engine_kwargs=None[, **kwargs]]]]]])
Parameter | Meta | Description |
---|---|---|
path | required | String or typing.BinaryIO to an Excel file. |
engine | optional | String specifying the engine to use for writing. Default is to use : xlwt for xls xlsxwriter for xlsx if xlsxwriter is installed otherwise openpyxl odf for ods |
date_format | optional | Format string for dates written into Excel files (e.g. 'YYYY-MM-DD' ). |
datetime_format | optional | Format string for datetime objects written into Excel files. (e.g. 'YYYY-MM-DD HH:MM:SS' ). |
mode | optional | File mode to use ('w' for write; 'a' for append). The default is to use 'w' . |
storage_options | optional | A dictionary that makes sense for a particular storage connection, e.g. host, port, username, password, etc. |
if_sheet_exists | optional | What to do if a sheet exists under the append mode. Accepts {'error', 'new', 'replace', 'overlay'} . Default is 'error' .error : raise a ValueError .new : create a new sheet, with a name determined by the engine.replace : delete the contents of the sheet before writing to it.overlay : write contents to the existing sheet without removing the old contents. |
engine_kwargs | optional | A dictionary containing keyword arguments passed into the engine. |
💡 Tip: Generally, you only need to use the four parameters in bold format. The default engine is automatically chosen depending on the file extension.
A small note: For compatibility with CSV writers, ExcelWriter
serializes lists and dicts to strings before writing.
Last, if you use the with
statement to create an ExcelWriter
, you do not need to worry about saving the change in the end! An example code to create an ExcelWriter
in the with
statement is:
import pandas as pd df = pd.DataFrame() excel_wb56_filepath = 'learn_excel_56.xlsx' # Syntax for create an ExcelWriter object in the with statement # To create a new empty Excel file, learn_excel_56.xlsx with two new sheets! with pd.ExcelWriter(excel_wb56_filepath) as writer: df.to_excel(writer, sheet_name='Class 5') df.to_excel(writer, sheet_name='Class 6')
After running the code, we create a new empty Excel file, learn_excel_56.xlsx
with two new sheets, Class 5 and Class 6!
Meet pandas.DataFrame.to_excel
Hooray! Now, let’s look at the syntax and parameters of the pandas.DataFrame.to_excel
method and get ourselves prepared for later examples!
Here is the syntax for pandas.DataFrame.to_excel
:
# Syntax for pandas.DataFrame.to_excel DataFrame.to_excel(excel_writer[, sheet_name='Sheet1'[, na_rep=''[, float_format=None[, columns=None[, header=True[, index=True[, index_label=None[, startrow=0[, startcol=0[, engine=None[, merge_cells=True[, encoding=None[, inf_rep='inf'[, verbose=True[, freeze_panes=None[, storage_options=None]]]]]]]]]]]]]]]])
To write a single object (pandas.DataFrame
) to an Excel .xlsx
file, it is only necessary to specify a target file name. To write to multiple sheets, it is necessary to create an ExcelWriter
object with a target file name.
Parameter | Meta | Description |
---|---|---|
excel_writer | Required | Target file name or ExcelWriter object. |
sheet_name | Optional | Name of sheet containing DataFrame. Default is "Sheet 1" . |
na_rep | Optional | Missing data representation. Default = "" |
float_format | Optional | Format string for floating point numbers. For example float_format="%.2f" will format 0.1234 to 0.12. |
column | Optional | Columns (in the DataFrame) to write. |
header | Optional | Row to be considered as the header, excluding from the data part. Default = True , which means the first row. If None , no header. |
index | Optional | Write row names (index). Default is True , which means show index. If set to be False , it means no index in the output Excel worksheet. |
index_label | Optional | Column label for the index. |
startrow | Optional | Upper left cell row to dump data frame. Default is 0. |
startcol | Optional | Upper left cell column to dump data frame. Default is 0. |
engine | Optional | String specifying the engine to use for writing. Default is xlwt for xls . xlsxwriter for xlsx if xlsxwriter is installed; otherwise openpyxl . |
merge_cells | Optional | Write MultiIndex and Hierarchical Rows as merged cells. |
encoding | Optional | Encoding of the resulting excel file. Only necessary for xlwt , other writers support Unicode natively. |
inf_rep | Optional | Representation for infinity (there is no native representation for infinity in Excel). |
verbose | Optional | Display more information in the error logs. |
freeze_panes | Optional | Specifies the one-based bottom-most row and right-most column that is to be frozen. |
storage_options | Optional | Extra options that make sense for a particular storage connection, e.g. host, port, username, password, etc. |
Tip: Generally, you only need to use the four parameters in bold format.
Please try to have an intuition for these parameters right now and I will cover the details for sheet_name
, index
, and index_label
parameters in our next exciting examples!
Getting started
To use the pandas.DataFrame.to_excel
method, you need to first install the pandas
package in your command line:
$ pip install pandas
Tip: you might need to use pip3
instead of pip
, depending on your environment.
Given the engine compatibility mentioned above, you also need to install respective engine libraries. For example, to use openpyxl
, you need to install this package on your command line:
$ pip install openpyxl
Tip: you might need to use pip3
instead of pip
, depending on your environment.
By the way, if you have already installed Anaconda, you can skip this step 🙂
Dataset
In our examples, we will create Excel workbooks like the two Excel workbooks (.xlsx
), learn_excel_12
and learn_excel_34
, in our first Excel in Python series.
These workbooks have the same data structures and column names in each worksheet. For example, the following is the data in the Class_1
worksheet in the learn_excel_12
workbook.
One row in each sheet represents a student in each class and the columns represent the student’s ID, Major, and Score respectively.
Concretely, learn_excel_12
has two worksheets, Class 1 and Class 2. And learn_excel_34
has two worksheets, Class 3 and Class 4.
You can find all datasets and codes in this tutorial here. The data in our example datasets are not actual data and are only used for educational purposes.
Save Data to One Worksheet
So, how can we save data to a single excel sheet? We can pass a target file name or create an ExcelWriter
object to do so!
For our example, we can create a new Excel file, learn_excel_56
and write some student data into the Class 5 sheet, specifying through the parameter sheet_name
.
import pandas as pd # For our writing mode example, # we can create a new Excel file, learn_excel_56 # and write some student data into it. excel_fp = 'learn_excel_56.xlsx' class5_df = pd.DataFrame( {'ID': [51, 52], 'Major': ['English', 'Math'], 'Score': [98, 89]}) with pd.ExcelWriter(excel_fp, mode='w') as writer: class5_df.to_excel(writer, sheet_name='Class 5', index=False)
After running the code, we can get the output Excel file:
Save Data to Multiple Worksheets
Likewise, we can save data to multiple worksheets by calling pandas.DataFrame.to_excel()
method multiple times.
For our example, we can create a new Excel file, learn_excel_56
and write some student data into the Class 5 sheet and Class 6 sheet, specifying through the parameter sheet_name
.
import pandas as pd # For our writing mode example, # we can create a new Excel file, learn_excel_56 # and write some student data into it. excel_fp = 'learn_excel_56.xlsx' class5_df = pd.DataFrame( {'ID': [51, 52], 'Major': ['English', 'Math'], 'Score': [98, 89]}) class6_df = pd.DataFrame( {'ID': [61, 62], 'Major': ['History', 'Math'], 'Score': [78, 96]}) with pd.ExcelWriter(excel_fp, mode='w') as writer: class5_df.to_excel(writer, sheet_name='Class 5', index=False) class6_df.to_excel(writer, sheet_name='Class 6', index=False)
After running the code, we can get the output Excel file:
Save Data to Multiple Workbooks
To get multiple workbooks, we can just create multiple pandas.ExcelWriter
objects.
🙂
To give a quick example, let’s create two workbooks, learn_excel_78
and learn_excel_910
.
import pandas as pd # let’s create two workbooks, learn_excel_78 and learn_excel_910. wb78_fp = 'learn_excel_78.xlsx' wb910_fp = 'learn_excel_910.xlsx' df = pd.DataFrame() with pd.ExcelWriter(wb78_fp, mode='w') as writer_78, pd.ExcelWriter(wb910_fp, mode='w') as writer_910: df.to_excel(writer_78, sheet_name='Class 7', index=False) df.to_excel(writer_78, sheet_name='Class 8', index=False) df.to_excel(writer_910, sheet_name='Class 9', index=False) df.to_excel(writer_910, sheet_name='Class 10', index=False)
After running the code, we can get the output workbooks:
In the learn_excel_78
file, we can see that we have created two empty worksheets:
So far, we have understood the basic writing operations. Let’s move forward and deal with the most common issue—index 🙂
Deal with Index
You can set up the index and index’ column label when calling the pandas.DataFrame.to_excel()
method.
Previously, our example codes set the index to be False
, which means no index column in the output Excel file. Let’s see what will happen if we set the index to be True
in the multiple worksheets scenario.
import pandas as pd # For our writing mode example, # we can create a new Excel file, learn_excel_56 # and write some student data into it. excel_fp = 'learn_excel_56.xlsx' class5_df = pd.DataFrame( {'ID': [51, 52], 'Major': ['English', 'Math'], 'Score': [98, 89]}) class6_df = pd.DataFrame( {'ID': [61, 62], 'Major': ['History', 'Math'], 'Score': [78, 96]}) with pd.ExcelWriter(excel_fp, mode='w') as writer: class5_df.to_excel(writer, sheet_name='Class 5', index=True) class6_df.to_excel(writer, sheet_name='Class 6', index=True)
After running the code, we can see that we now have an index column, counting from zero.
On top of that, we can give a column name to the index column by specifying the parameter, index_label
.
import pandas as pd # For our writing mode example, # we can create a new Excel file, learn_excel_56 # and write some student data into it. excel_fp = 'learn_excel_56.xlsx' class5_df = pd.DataFrame( {'ID': [51, 52], 'Major': ['English', 'Math'], 'Score': [98, 89]}) class6_df = pd.DataFrame( {'ID': [61, 62], 'Major': ['History', 'Math'], 'Score': [78, 96]}) with pd.ExcelWriter(excel_fp, mode='w') as writer: class5_df.to_excel(writer, sheet_name='Class 5', index=True, index_label='No.') class6_df.to_excel(writer, sheet_name='Class 6', index=True, index_label='No.')
After running the code, we can see that we now have an index column with a name, “No.”!
Conclusion
That’s how to save data to Excel. This is the second article in our Excel in Python series.
We learned about Excel’s data structure and commonly used lingos and four popular ways to save data to Excel in Python, including pandas
and openpyxl
.
Finally, we looked at four actual use cases in pandas.DataFrame.to_excel
, ranging from one Excel worksheet, multiple Excel worksheets, multiple Excel workbooks, and dealing with index cases.
I hope you enjoy all this, and stay tuned for our following Excel in Python article on saving data to Excel! Happy coding!
More Pandas DataFrame Methods
Feel free to learn more about the previous and next pandas DataFrame methods (alphabetically) here:
Also, check out the full cheat sheet overview of all Pandas DataFrame methods.