Pandas Read and Write Excel Files - Be on the Right Side of Change

Over your career as a Data Scientist, there may be instances where you will work with data to/from a DataFrame to an Excel file. This article shows you how to manipulate this data using the above functions.

This article covers the commonly used parameters for each function listed above. For a complete list of all parameters and their use, click here.

⭐ Note: This article assumes that Excel resides on the computer.

Preparation

Before any data manipulation can occur, one (1) library will require installation.

The Pandas library enables access to/from a DataFrame.

To install this library, navigate to an IDE terminal. At the command prompt ($), execute the code below. For the terminal used in this example, the command prompt is a dollar sign ($). Your terminal prompt may be different.

$ pip install pandas

Hit the <Enter> key on the keyboard to start the installation process.

If the installation was successful, a message displays in the terminal indicating the same.

Feel free to view the PyCharm installation guide for the required library.

How to install Pandas on PyCharm

Remember to add the Required Starter Code to the top of each code snippet. This snippet will allow the code in this article to run error-free.

import pandas as pd

Read Excel File

Function Outline

pandas.read_excel(io, sheet_name=0, header=0, names=None, 
                  index_col=None, usecols=None, squeeze=False, 
                  dtype=None, engine=None, converters=None, 
                  true_values=None, false_values=None, skiprows=None, 
                  nrows=None, na_values=None, keep_default_na=True, 
                  na_filter=True, verbose=False, parse_dates=False, 
                  date_parser=None, thousands=None, comment=None, 
                  skipfooter=0, convert_float=None, 
                  mangle_dupe_cols=True, storage_options=None)

The read_excel() function reads an Excel file into a pandas DataFrame.

For this example, we will save a list of the Titanic’s 2nd Class Passengers in Excel. To follow along, perform the following steps.

Navigate to https://titanicfacts.net/titanic-passenger-list/#2ndclass.
Highlight the 2nd Class Passenger list (including the header row).
Press CTRL+C to copy the highlighted area to the system Clipboard.
Open Excel.
Click inside the A1 cell.
Press CTRL+V to paste the contents of the system Clipboard to the Excel Worksheet.
Save the Excel file as titanic.xlsx and place this file in the current working directory.

Below are the first few lines of the file.

To read this file into a DataFrame, run the following code.

df = pd.read_excel('titanic.xlsx')
print(df.head(4))

Line [1] reads in the titanic.xlsx file created earlier.
Line [2] outputs five rows: one header row and four data rows to the terminal.

Output

	Surname	First Names	Age	Boarded	Survivor (S) or Victim (†)
0	Abelson	Mr Samuel	30	Cherbourg	†
1	Abelson	Mrs Hannah	28	Cherbourg	S
2	Aldworth	Mr Augustus Henry	34	Southampton	†
3	Andrew	Mr Frank Thomas	25	Southampton	†

To hide the index column (far left column above), set the index_col parameter as follows:

df = pd.read_excel('titanic.xlsx', index_col=0)

Surname	First Names	Age	Boarded	Survivor (S) or Victim (†)
Abelson	Mr Samuel	30	Cherbourg	†
Abelson	Mrs Hannah	28	Cherbourg	S
Aldworth	Mr Augustus Henry	34	Southampton	†
Andrew	Mr Frank Thomas	25	Southampton	†

To change the header row a name to a number, set the header parameter as follows:

df = pd.read_excel('titanic.xlsx', header=None)

	0	1	2	3	4
0	Abelson	Mr Samuel	30	Cherbourg	†
1	Abelson	Mrs Hannah	28	Cherbourg	S
2	Aldworth	Mr Augustus Henry	34	Southampton	†
3	Andrew	Mr Frank Thomas	25	Southampton	†

For a list of available parameters for this function, click here.

DataFrame to Excel

Function Outline

DataFrame.to_excel(excel_writer, sheet_name='Sheet1', na_rep='', 
                   float_format=None, columns=None, header=True, 
                   index=True, index_label=None, startrow=0, 
                   startcol=0, engine=None, merge_cells=True, 
                   encoding=None, inf_rep='inf', verbose=True, 
                   freeze_panes=None, storage_options=None)

The to_excel() function writes a DataFrame to an Excel Worksheet.

For this example, a new Excel file emerges from the original file with some differences.

df = pd.read_excel('titanic.xlsx')
cols = ['Surname', 'First Names', 'Age']
df.to_excel('titanic1.xlsx', sheet_name='Sheet1', columns=cols, freeze_panes=(1,4))

Line [1] reads in the titanic.xlsx file created earlier.
Line [2] creates a list with the columns we want in the new Excel file.
Line [3] saves a new Excel file with the following options:
- Indicates the Worksheet to read in (optional).
- Sets the columns to show (see column list on Line [3]).
- Freezes the first row, four columns (pane).

Output

Below are the first few lines of the file. Try scrolling to ensure the freeze pane option worked.

For a list of available parameters for this function, click here.

Excel Styler and Excel Writer

Function Outline

Styler.to_excel(excel_writer, sheet_name='Sheet1', na_rep='', 
                float_format=None, columns=None, header=True, 
                index=True, index_label=None, startrow=0, startcol=0, 
                engine=None, merge_cells=True, encoding=None, 
                inf_rep='inf', verbose=True, freeze_panes=None)

This function writes styles to an Excel Worksheet.

class pandas.ExcelWriter(path, engine=None, date_format=None, 
                         datetime_format=None, mode='w', 
                         storage_options=None, if_sheet_exists=None, 
                         engine_kwargs=None, **kwargs)

This function is a class for writing a DataFrame to an Excel Worksheet.

For this example, we will be creating a dictionary of employees for a small start-up IT company. Run the following code to create a new Excel file, salaries.xlsx.

df = pd.DataFrame({'Fname':     ['Anne', 'Mark', 'Paul', 'Ben', 'Micah'],
                 		'Salaries':   [98273,  84900,  120876, 52759, 99767],
                   		'Profit':       [.15,    .23,     .8 ,    .7,    .3]})

xw = pd.ExcelWriter("salaries.xlsx", engine='xlsxwriter')
df.to_excel(xw, sheet_name='Sheet1')

workbook  = xw.book
worksheet = xw.sheets['Sheet1']

sal = workbook.add_format({'num_format': '$#,##0.00'})
pro = workbook.add_format({'num_format': '0%'})

worksheet.set_column('B:B', 15)
worksheet.set_column('C:C', 25, sal)
worksheet.set_column('D:D', 26, pro)
xw.save()

Line [1] creates a new DataFrame: a dictionary of lists.
Line [2-3] sets up the Excel file, the engine, and the Worksheet.
Line [4-5] assigns the Workbook and Worksheet.
Line [6-7] sets up the formatting for the Salaries and Profit fields.
Line [8-10] sets the width of each column and the styles if needed.
Line [11] saves the new Excel file.

Output

For a list of available parameters for these functions, click here.