Over your career as a Data Scientist, there may be instances where you will work with data to/from a DataFrame to an Excel file. This article shows you how to manipulate this data using the above functions.
This article covers the commonly used parameters for each function listed above. For a complete list of all parameters and their use, click here.
β Note: This article assumes that Excel resides on the computer.
Preparation
Before any data manipulation can occur, one (1) library will require installation.
- The Pandas library enables access to/from a DataFrame.
To install this library, navigate to an IDE terminal. At the command prompt ($
), execute the code below. For the terminal used in this example, the command prompt is a dollar sign ($
). Your terminal prompt may be different.
$ pip install pandas
Hit the <Enter>
key on the keyboard to start the installation process.
If the installation was successful, a message displays in the terminal indicating the same.
Feel free to view the PyCharm installation guide for the required library.
Remember to add the Required Starter Code to the top of each code snippet. This snippet will allow the code in this article to run error-free.
import pandas as pd
Read Excel File
Function Outline
pandas.read_excel(io, sheet_name=0, header=0, names=None, index_col=None, usecols=None, squeeze=False, dtype=None, engine=None, converters=None, true_values=None, false_values=None, skiprows=None, nrows=None, na_values=None, keep_default_na=True, na_filter=True, verbose=False, parse_dates=False, date_parser=None, thousands=None, comment=None, skipfooter=0, convert_float=None, mangle_dupe_cols=True, storage_options=None)
The read_excel()
function reads an Excel file into a pandas DataFrame.
For this example, we will save a list of the Titanicβs 2nd Class Passengers in Excel. To follow along, perform the following steps.
- Navigate to https://titanicfacts.net/titanic-passenger-list/#2ndclass.
- Highlight the 2nd Class Passenger list (including the header row).
- Press
CTRL+C
to copy the highlighted area to the system Clipboard. - Open Excel.
- Click inside the
A1
cell. - Press
CTRL+V
to paste the contents of the system Clipboard to the Excel Worksheet. - Save the Excel file as
titanic.xlsx
and place this file in the current working directory.
Below are the first few lines of the file.
To read this file into a DataFrame, run the following code.
df = pd.read_excel('titanic.xlsx') print(df.head(4))
- Line [1] reads in the
titanic.xlsx
file created earlier. - Line [2] outputs five rows: one header row and four data rows to the terminal.
Output
Surname | First Names | Age | Boarded | Survivor (S) or Victim (β ) | |
0 | Abelson | Mr Samuel | 30 | Cherbourg | β |
1 | Abelson | Mrs Hannah | 28 | Cherbourg | S |
2 | Aldworth | Mr Augustus Henry | 34 | Southampton | β |
3 | Andrew | Mr Frank Thomas | 25 | Southampton | β |
To hide the index column (far left column above), set the index_col parameter as follows:
df = pd.read_excel('titanic.xlsx', index_col=0)
Surname | First Names | Age | Boarded | Survivor (S) or Victim (β ) |
Abelson | Mr Samuel | 30 | Cherbourg | β |
Abelson | Mrs Hannah | 28 | Cherbourg | S |
Aldworth | Mr Augustus Henry | 34 | Southampton | β |
Andrew | Mr Frank Thomas | 25 | Southampton | β |
To change the header row a name to a number, set the header
parameter as follows:
df = pd.read_excel('titanic.xlsx', header=None)
0 | 1 | 2 | 3 | 4 | |
0 | Abelson | Mr Samuel | 30 | Cherbourg | β |
1 | Abelson | Mrs Hannah | 28 | Cherbourg | S |
2 | Aldworth | Mr Augustus Henry | 34 | Southampton | β |
3 | Andrew | Mr Frank Thomas | 25 | Southampton | β |
For a list of available parameters for this function, click here.
DataFrame to Excel
Function Outline
DataFrame.to_excel(excel_writer, sheet_name='Sheet1', na_rep='', float_format=None, columns=None, header=True, index=True, index_label=None, startrow=0, startcol=0, engine=None, merge_cells=True, encoding=None, inf_rep='inf', verbose=True, freeze_panes=None, storage_options=None)
The to_excel()
function writes a DataFrame to an Excel Worksheet.
For this example, a new Excel file emerges from the original file with some differences.
df = pd.read_excel('titanic.xlsx') cols = ['Surname', 'First Names', 'Age'] df.to_excel('titanic1.xlsx', sheet_name='Sheet1', columns=cols, freeze_panes=(1,4))
- Line [1] reads in the
titanic.xlsx
file created earlier. - Line [2] creates a list with the columns we want in the new Excel file.
- Line [3] saves a new Excel file with the following options:
- Indicates the Worksheet to read in (optional).
- Sets the columns to show (see column list on Line [3]).
- Freezes the first row, four columns (pane).
Output
Below are the first few lines of the file. Try scrolling to ensure the freeze pane option worked.
For a list of available parameters for this function, click here.
Excel Styler and Excel Writer
Function Outline
Styler.to_excel(excel_writer, sheet_name='Sheet1', na_rep='', float_format=None, columns=None, header=True, index=True, index_label=None, startrow=0, startcol=0, engine=None, merge_cells=True, encoding=None, inf_rep='inf', verbose=True, freeze_panes=None)
This function writes styles to an Excel Worksheet.
class pandas.ExcelWriter(path, engine=None, date_format=None, datetime_format=None, mode='w', storage_options=None, if_sheet_exists=None, engine_kwargs=None, **kwargs)
This function is a class for writing a DataFrame to an Excel Worksheet.
For this example, we will be creating a dictionary of employees for a small start-up IT company. Run the following code to create a new Excel file, salaries.xlsx
.
df = pd.DataFrame({'Fname': ['Anne', 'Mark', 'Paul', 'Ben', 'Micah'], 'Salaries': [98273, 84900, 120876, 52759, 99767], 'Profit': [.15, .23, .8 , .7, .3]}) xw = pd.ExcelWriter("salaries.xlsx", engine='xlsxwriter') df.to_excel(xw, sheet_name='Sheet1') workbook = xw.book worksheet = xw.sheets['Sheet1'] sal = workbook.add_format({'num_format': '$#,##0.00'}) pro = workbook.add_format({'num_format': '0%'}) worksheet.set_column('B:B', 15) worksheet.set_column('C:C', 25, sal) worksheet.set_column('D:D', 26, pro) xw.save()
- Line [1] creates a new DataFrame: a dictionary of lists.
- Line [2-3] sets up the Excel file, the engine, and the Worksheet.
- Line [4-5] assigns the Workbook and Worksheet.
- Line [6-7] sets up the formatting for the
Salaries
andProfit
fields. - Line [8-10] sets the width of each column and the styles if needed.
- Line [11] saves the new Excel file.
Output
For a list of available parameters for these functions, click here.