Pandas Read and Write Excel Files

Over your career as a Data Scientist, there may be instances where you will work with data to/from a DataFrame to an Excel file. This article shows you how to manipulate this data using the above functions.

This article covers the commonly used parameters for each function listed above. For a complete list of all parameters and their use, click here.

⭐ Note: This article assumes that Excel resides on the computer.


Preparation

Before any data manipulation can occur, one (1) library will require installation.

  • The Pandas library enables access to/from a DataFrame.

To install this library, navigate to an IDE terminal. At the command prompt ($), execute the code below. For the terminal used in this example, the command prompt is a dollar sign ($). Your terminal prompt may be different.

$ pip install pandas

Hit the <Enter> key on the keyboard to start the installation process.

If the installation was successful, a message displays in the terminal indicating the same.


Feel free to view the PyCharm installation guide for the required library.


Remember to add the Required Starter Code to the top of each code snippet. This snippet will allow the code in this article to run error-free.

import pandas as pd

Read Excel File

Function Outline

pandas.read_excel(io, sheet_name=0, header=0, names=None, 
                  index_col=None, usecols=None, squeeze=False, 
                  dtype=None, engine=None, converters=None, 
                  true_values=None, false_values=None, skiprows=None, 
                  nrows=None, na_values=None, keep_default_na=True, 
                  na_filter=True, verbose=False, parse_dates=False, 
                  date_parser=None, thousands=None, comment=None, 
                  skipfooter=0, convert_float=None, 
                  mangle_dupe_cols=True, storage_options=None)

The read_excel() function reads an Excel file into a pandas DataFrame.

For this example, we will save a list of the Titanic’s 2nd Class Passengers in Excel. To follow along, perform the following steps.

  • Navigate to https://titanicfacts.net/titanic-passenger-list/#2ndclass.
  • Highlight the 2nd Class Passenger list (including the header row).
  • Press CTRL+C to copy the highlighted area to the system Clipboard.
  • Open Excel.
  • Click inside the A1 cell.
  • Press CTRL+V to paste the contents of the system Clipboard to the Excel Worksheet.
  • Save the Excel file as titanic.xlsx and place this file in the current working directory.

Below are the first few lines of the file.

To read this file into a DataFrame, run the following code.

df = pd.read_excel('titanic.xlsx')
print(df.head(4))
  • Line [1] reads in the titanic.xlsx file created earlier.
  • Line [2] outputs five rows: one header row and four data rows to the terminal.

Output

 SurnameFirst NamesAge     BoardedSurvivor (S) or Victim (†)
0Abelson         Mr Samuel 30Cherbourg                         β€ 
1Abelson         Mrs Hannah 28Cherbourg                         S
2Aldworth Mr Augustus Henry 34Southampton                         β€ 
3Andrew   Mr Frank Thomas 25Southampton                         β€ 

To hide the index column (far left column above), set the index_col parameter as follows:

df = pd.read_excel('titanic.xlsx', index_col=0)
SurnameFirst NamesAge     BoardedSurvivor (S) or Victim (†)
Abelson         Mr Samuel 30Cherbourg                         β€ 
Abelson         Mrs Hannah 28Cherbourg                         S
Aldworth Mr Augustus Henry 34Southampton                         β€ 
Andrew   Mr Frank Thomas 25Southampton                         β€ 

To change the header row a name to a number, set the header parameter as follows:

df = pd.read_excel('titanic.xlsx', header=None)
 01234
0Abelson         Mr Samuel 30Cherbourg                         β€ 
1Abelson         Mrs Hannah 28Cherbourg                         S
2Aldworth Mr Augustus Henry 34Southampton                         β€ 
3Andrew   Mr Frank Thomas 25Southampton                         β€ 

For a list of available parameters for this function, click here.


DataFrame to Excel

Function Outline

DataFrame.to_excel(excel_writer, sheet_name='Sheet1', na_rep='', 
                   float_format=None, columns=None, header=True, 
                   index=True, index_label=None, startrow=0, 
                   startcol=0, engine=None, merge_cells=True, 
                   encoding=None, inf_rep='inf', verbose=True, 
                   freeze_panes=None, storage_options=None)

The to_excel() function writes a DataFrame to an Excel Worksheet.

For this example, a new Excel file emerges from the original file with some differences.

df = pd.read_excel('titanic.xlsx')
cols = ['Surname', 'First Names', 'Age']
df.to_excel('titanic1.xlsx', sheet_name='Sheet1', columns=cols, freeze_panes=(1,4))
  • Line [1] reads in the titanic.xlsx file created earlier.
  • Line [2] creates a list with the columns we want in the new Excel file.
  • Line [3] saves a new Excel file with the following options:
    • Indicates the Worksheet to read in (optional).
    • Sets the columns to show (see column list on Line [3]).
    • Freezes the first row, four columns (pane).

Output

Below are the first few lines of the file. Try scrolling to ensure the freeze pane option worked.

For a list of available parameters for this function, click here.


Excel Styler and Excel Writer

Function Outline

Styler.to_excel(excel_writer, sheet_name='Sheet1', na_rep='', 
                float_format=None, columns=None, header=True, 
                index=True, index_label=None, startrow=0, startcol=0, 
                engine=None, merge_cells=True, encoding=None, 
                inf_rep='inf', verbose=True, freeze_panes=None)

This function writes styles to an Excel Worksheet.

class pandas.ExcelWriter(path, engine=None, date_format=None, 
                         datetime_format=None, mode='w', 
                         storage_options=None, if_sheet_exists=None, 
                         engine_kwargs=None, **kwargs)

This function is a class for writing a DataFrame to an Excel Worksheet.

For this example, we will be creating a dictionary of employees for a small start-up IT company. Run the following code to create a new Excel file, salaries.xlsx.

df = pd.DataFrame({'Fname':     ['Anne', 'Mark', 'Paul', 'Ben', 'Micah'],
                 		'Salaries':   [98273,  84900,  120876, 52759, 99767],
                   		'Profit':       [.15,    .23,     .8 ,    .7,    .3]})

xw = pd.ExcelWriter("salaries.xlsx", engine='xlsxwriter')
df.to_excel(xw, sheet_name='Sheet1')

workbook  = xw.book
worksheet = xw.sheets['Sheet1']

sal = workbook.add_format({'num_format': '$#,##0.00'})
pro = workbook.add_format({'num_format': '0%'})

worksheet.set_column('B:B', 15)
worksheet.set_column('C:C', 25, sal)
worksheet.set_column('D:D', 26, pro)
xw.save()
  • Line [1] creates a new DataFrame: a dictionary of lists.
  • Line [2-3] sets up the Excel file, the engine, and the Worksheet.
  • Line [4-5] assigns the Workbook and Worksheet.
  • Line [6-7] sets up the formatting for the Salaries and Profit fields.
  • Line [8-10] sets the width of each column and the styles if needed.
  • Line [11] saves the new Excel file.

Output

For a list of available parameters for these functions, click here.