Pandas DataFrame.to_excel() – An Unofficial Guide to Saving Data to Excel

Rate this post

Microsoft Excel is a cross-platform and battle-tested spreadsheet software widely used for data analysis and visualization. It is a powerful and user-friendly tool indeed! But how can we bring it to the next level?

We can combine Excel with Python to

  • read data from Excel,
  • save data to Excel with formats, formulas, and even charts, and
  • automate Excel tasks in Python!

Please continue reading and stay tuned for my Excel in Python series if it sounds great!

This tutorial is all about saving data to Excel.

Concretely, I will first introduce Excel’s data structure and lingos.

Then, you will learn the difference between four popular Python ways to save data to excel, including pandas and openpyxl.

Finally, I will focus on the pandas.DataFrame.to_excel() method. I will guide you through four actual use cases in pandas.DataFrame.to_excel(), ranging from one Excel worksheet, multiple Excel worksheets, multiple Excel workbooks, and dealing with index cases. 

You can find all datasets and codes in this tutorial here. The data in our example datasets are not actual data and are only used for educational purposes.

Quick Introduction to Excel

Before jumping into saving data to Excel, let’s look at how data is stored in Excel and get ourselves familiar with some Excel lingos.

Simply put, data is stored in cells in Excel, and each cell can be identified with its unique row and column number pair. 

Columns in Excel are labeled in alphabets, starting from ‘A’, and rows in Excel are labeled in roman numbers, starting from ‘1’. For example, in the following Excel picture, A1 is the cell in the intersection of the first column and first row, ‘ID’.

Yes, you got the idea! It is just like the DataFrame data structure in our friends, pandas.

And if you want to select an area to dump data into Excel, you can imagine drawing a rectangle data region based on the upper-left cell of the part.

On top of that, other commonly seen Excel lingos include worksheets and workbooks. 

  • A worksheet means a single spreadsheet in an Excel file. 
  • A workbook means a single Excel file with extensions like .xlsx and .xls

Tip: More file extensions supported by Excel can be found in the Microsoft official doc here.

Now, you are ready to know how to save data to Excel!

Popular Python ways to Save Data to Excel

Here is the summary of popular Python ways to save data to excel:

save data to excel python pandas

One of pandas.DataFrame.to_excel()’s advantages is engine compatibility. It uses multiple external engines for different Excel extensions, including openpyxl, xlwt, and xlsxwriter.

  • xlwt” supports old-style Microsoft Excel versions 95 to 2003 files (.xls).
  • openpyxl” supports newer Excel 2010 file formats (.xlsx, .xlsm, .xltx, .xltm).
  • xlsxwriter” supports Excel 2007+ file format (.xlsx).

In summary, you can use openpyxl and other libraries for specific Excel file formats and basic data processing. And please remember that openpyxl and xlsxwriter support newer Excel file formats

However, I recommend using pandas.DataFrame.to_excel() for data science and analytics applications because it supports most Excel file formats and accepts a DataFrame object with powerful methods.

Therefore, let’s see how to use pandas.DataFrame.to_excel() to save data to Excel!

Saving Data to Excel – pandas.DataFrame.to_excel()

Since pandas.DataFrame.to_excel() is the most powerful and inclusive way to save data from Excel files with different extensions, I will first introduce its syntax and walk you through three use cases with real Python codes below using pandas.DataFrame.to_excel().

Meet pandas.ExcelWriter

Before heading over to the pandas.DataFrame.to_excel method, we need to know a new friend, pandas.ExcelWriter. It is a class for writing pandas.DataFrame objects into excel sheets.

When you are trying to write to multiple sheets, you need to create an ExcelWriter object and pass it to pandas.DataFrame.to_excel() as the first parameter.

To create an ExcelWriter object, we pass something to it according to its syntax:

# Syntax for pandas.ExcelWriter
pandas.ExcelWriter(path[, engine=None[, date_format=None[, datetime_format=None[, mode='w'[, storage_options=None[, if_sheet_exists=None[, engine_kwargs=None[, **kwargs]]]]]])
ParameterMetaDescription
pathrequiredString or typing.BinaryIO to an Excel file.
engineoptionalString specifying the engine to use for writing. Default is to use : 
xlwt for xls 
xlsxwriter for xlsx if xlsxwriter is installed otherwise openpyxl
odf for ods
date_formatoptionalFormat string for dates written into Excel files (e.g. 'YYYY-MM-DD').
datetime_formatoptionalFormat string for datetime objects written into Excel files. (e.g. 'YYYY-MM-DD HH:MM:SS').
modeoptionalFile mode to use ('w' for write; 'a' for append). The default is to use 'w'.
storage_optionsoptionalA dictionary that makes sense for a particular storage connection, e.g. host, port, username, password, etc.
if_sheet_existsoptionalWhat to do if a sheet exists under the append mode. Accepts {'error', 'new', 'replace', 'overlay'}. Default is 'error'.
error: raise a ValueError.
new: create a new sheet, with a name determined by the engine.
replace: delete the contents of the sheet before writing to it.
overlay: write contents to the existing sheet without removing the old contents.
engine_kwargsoptionalA dictionary containing keyword arguments passed into the engine. 

💡 Tip: Generally, you only need to use the four parameters in bold format. The default engine is automatically chosen depending on the file extension.

A small note: For compatibility with CSV writers, ExcelWriter serializes lists and dicts to strings before writing.

Last, if you use the with statement to create an ExcelWriter, you do not need to worry about saving the change in the end! An example code to create an ExcelWriter in the with statement is:

import pandas as pd

df = pd.DataFrame()
excel_wb56_filepath = 'learn_excel_56.xlsx'

# Syntax for create an ExcelWriter object in the with statement
# To create a new empty Excel file, learn_excel_56.xlsx with two new sheets!
with pd.ExcelWriter(excel_wb56_filepath) as writer:
    df.to_excel(writer, sheet_name='Class 5')
    df.to_excel(writer, sheet_name='Class 6')

After running the code, we create a new empty Excel file, learn_excel_56.xlsx with two new sheets, Class 5 and Class 6!

Meet pandas.DataFrame.to_excel

Hooray! Now, let’s look at the syntax and parameters of the pandas.DataFrame.to_excel method and get ourselves prepared for later examples!

Here is the syntax for pandas.DataFrame.to_excel:

# Syntax for pandas.DataFrame.to_excel
DataFrame.to_excel(excel_writer[, sheet_name='Sheet1'[, na_rep=''[, float_format=None[, columns=None[, header=True[, index=True[, index_label=None[, startrow=0[, startcol=0[, engine=None[, merge_cells=True[, encoding=None[, inf_rep='inf'[, verbose=True[, freeze_panes=None[, storage_options=None]]]]]]]]]]]]]]]])

To write a single object (pandas.DataFrame) to an Excel .xlsx file, it is only necessary to specify a target file name. To write to multiple sheets, it is necessary to create an ExcelWriter object with a target file name.

ParameterMetaDescription
excel_writerRequiredTarget file name or ExcelWriter object.
sheet_nameOptionalName of sheet containing DataFrame. Default is "Sheet 1".
na_repOptionalMissing data representation. Default = ""
float_formatOptionalFormat string for floating point numbers. For example float_format="%.2f" will format 0.1234 to 0.12.
columnOptionalColumns (in the DataFrame) to write.
headerOptionalRow to be considered as the header, excluding from the data part. Default = True, which means the first row. If None, no header.
indexOptionalWrite row names (index). Default is True, which means show index. If set to be False, it means no index in the output Excel worksheet.
index_labelOptionalColumn label for the index.
startrowOptionalUpper left cell row to dump data frame. Default is 0.
startcolOptionalUpper left cell column to dump data frame. Default is 0.
engineOptionalString specifying the engine to use for writing. Default is xlwt for xls.
xlsxwriter for xlsx if xlsxwriter is installed; otherwise openpyxl.
merge_cellsOptionalWrite MultiIndex and Hierarchical Rows as merged cells.
encodingOptionalEncoding of the resulting excel file. Only necessary for xlwt, other writers support Unicode natively.
inf_repOptionalRepresentation for infinity (there is no native representation for infinity in Excel).
verboseOptionalDisplay more information in the error logs.
freeze_panesOptionalSpecifies the one-based bottom-most row and right-most column that is to be frozen.
storage_optionsOptionalExtra options that make sense for a particular storage connection, e.g. host, port, username, password, etc.

Tip: Generally, you only need to use the four parameters in bold format. 

Please try to have an intuition for these parameters right now and I will cover the details for sheet_name, index, and index_label parameters in our next exciting examples!

Getting started

To use the pandas.DataFrame.to_excel method, you need to first install the pandas package in your command line:

$ pip install pandas 

Tip: you might need to use pip3 instead of pip, depending on your environment.

Given the engine compatibility mentioned above, you also need to install respective engine libraries. For example, to use openpyxl, you need to install this package on your command line:

$ pip install openpyxl

Tip: you might need to use pip3 instead of pip, depending on your environment.

By the way, if you have already installed Anaconda, you can skip this step 🙂

Dataset

In our examples, we will create Excel workbooks like the two Excel workbooks (.xlsx), learn_excel_12 and learn_excel_34, in our first Excel in Python series.

These workbooks have the same data structures and column names in each worksheet. For example, the following is the data in the Class_1 worksheet in the learn_excel_12 workbook.

One row in each sheet represents a student in each class and the columns represent the student’s ID, Major, and Score respectively. 

Concretely, learn_excel_12 has two worksheets, Class 1 and Class 2. And learn_excel_34 has two worksheets, Class 3 and Class 4.

You can find all datasets and codes in this tutorial here. The data in our example datasets are not actual data and are only used for educational purposes.

Save Data to One Worksheet

So, how can we save data to a single excel sheet? We can pass a target file name or create an ExcelWriter object to do so!

For our example, we can create a new Excel file, learn_excel_56 and write some student data into the Class 5 sheet, specifying through the parameter sheet_name.

import pandas as pd

# For our writing mode example,
# we can create a new Excel file, learn_excel_56
# and write some student data into it.
excel_fp = 'learn_excel_56.xlsx'
class5_df = pd.DataFrame(
    {'ID': [51, 52], 'Major': ['English', 'Math'], 'Score': [98, 89]})

with pd.ExcelWriter(excel_fp, mode='w') as writer:
    class5_df.to_excel(writer, sheet_name='Class 5', index=False)

After running the code,  we can get the output Excel file:

Save Data to Multiple Worksheets

Likewise, we can save data to multiple worksheets by calling pandas.DataFrame.to_excel() method multiple times.

For our example, we can create a new Excel file, learn_excel_56 and write some student data into the Class 5 sheet and Class 6 sheet, specifying through the parameter sheet_name.

import pandas as pd

# For our writing mode example,
# we can create a new Excel file, learn_excel_56
# and write some student data into it.
excel_fp = 'learn_excel_56.xlsx'
class5_df = pd.DataFrame(
    {'ID': [51, 52], 'Major': ['English', 'Math'], 'Score': [98, 89]})
class6_df = pd.DataFrame(
    {'ID': [61, 62], 'Major': ['History', 'Math'], 'Score': [78, 96]})

with pd.ExcelWriter(excel_fp, mode='w') as writer:
    class5_df.to_excel(writer, sheet_name='Class 5', index=False)
    class6_df.to_excel(writer, sheet_name='Class 6', index=False)

After running the code,  we can get the output Excel file:

Save Data to Multiple Workbooks

To get multiple workbooks, we can just create multiple pandas.ExcelWriter objects.

🙂

To give a quick example, let’s create two workbooks, learn_excel_78 and learn_excel_910.

import pandas as pd

# let’s create two workbooks, learn_excel_78 and learn_excel_910.
wb78_fp = 'learn_excel_78.xlsx'
wb910_fp = 'learn_excel_910.xlsx'

df = pd.DataFrame()

with pd.ExcelWriter(wb78_fp, mode='w') as writer_78, pd.ExcelWriter(wb910_fp, mode='w') as writer_910:
    df.to_excel(writer_78, sheet_name='Class 7', index=False)
    df.to_excel(writer_78, sheet_name='Class 8', index=False)
    df.to_excel(writer_910, sheet_name='Class 9', index=False)
    df.to_excel(writer_910, sheet_name='Class 10', index=False)

After running the code,  we can get the output workbooks:

In the learn_excel_78 file, we can see that we have created two empty worksheets:

So far, we have understood the basic writing operations. Let’s move forward and deal with the most common issue—index 🙂

Deal with Index

You can set up the index and index’ column label when calling the pandas.DataFrame.to_excel() method.

Previously, our example codes set the index to be False, which means no index column in the output Excel file. Let’s see what will happen if we set the index to be True in the multiple worksheets scenario.

import pandas as pd

# For our writing mode example,
# we can create a new Excel file, learn_excel_56
# and write some student data into it.
excel_fp = 'learn_excel_56.xlsx'
class5_df = pd.DataFrame(
    {'ID': [51, 52], 'Major': ['English', 'Math'], 'Score': [98, 89]})
class6_df = pd.DataFrame(
    {'ID': [61, 62], 'Major': ['History', 'Math'], 'Score': [78, 96]})

with pd.ExcelWriter(excel_fp, mode='w') as writer:
    class5_df.to_excel(writer, sheet_name='Class 5', index=True)
    class6_df.to_excel(writer, sheet_name='Class 6', index=True)

After running the code, we can see that we now have an index column, counting from zero.

On top of that, we can give a column name to the index column by specifying the parameter, index_label.

import pandas as pd

# For our writing mode example,
# we can create a new Excel file, learn_excel_56
# and write some student data into it.
excel_fp = 'learn_excel_56.xlsx'
class5_df = pd.DataFrame(
    {'ID': [51, 52], 'Major': ['English', 'Math'], 'Score': [98, 89]})
class6_df = pd.DataFrame(
    {'ID': [61, 62], 'Major': ['History', 'Math'], 'Score': [78, 96]})

with pd.ExcelWriter(excel_fp, mode='w') as writer:
    class5_df.to_excel(writer, sheet_name='Class 5',
                       index=True, index_label='No.')
    class6_df.to_excel(writer, sheet_name='Class 6',
                       index=True, index_label='No.')

After running the code, we can see that we now have an index column with a name, “No.”!

Conclusion

That’s how to save data to Excel. This is the second article in our Excel in Python series.

We learned about Excel’s data structure and commonly used lingos and four popular ways to save data to Excel in Python, including pandas and openpyxl.

Finally, we looked at four actual use cases in pandas.DataFrame.to_excel, ranging from one Excel worksheet, multiple Excel worksheets, multiple Excel workbooks, and dealing with index cases.

I hope you enjoy all this, and stay tuned for our following Excel in Python article on saving data to Excel! Happy coding!


More Pandas DataFrame Methods

Feel free to learn more about the previous and next pandas DataFrame methods (alphabetically) here:

Also, check out the full cheat sheet overview of all Pandas DataFrame methods.