Python Working with Excel – Getting Started

To provide a solid understanding of data manipulation with Excel, the Finxter Academy has created a 6-part series to take the user from beginner to advanced. This article assumes that Microsoft Excel is available on the computer.


Quick Intro

In most cases, working with large amounts of data requires some manipulation to make it more readable for the general user. How do you accomplish this? Look no further! Python’s openpyxl library provides the tools to produce a well-formed Excel spreadsheet complete with:

  • custom heading(s) and/or sub-heading(s),
  • formatted column(s) and/or row(s),
  • sorted data,
  • simple/complex calculations, and much, much more.

Excel spreadsheets are user-friendly and do not require extensive knowledge of this software to perform the task(s) at hand.


Preparation

Before any data manipulation can occur, two (2) new libraries will require installation.

  • The Pandas library enables access to/from a DataFrame.
  • The Openpyxl library enables access to/From Excel.

To install these libraries, navigate to an IDE terminal. At the command prompt ($), execute the code below. For the terminal used in this example, the command prompt is a dollar sign ($). Your terminal prompt may be different.


$ pip install pandas

Hit the <Enter> key on the keyboard to start the installation process.

$ pip install openpyxl

Hit the <Enter> key on the keyboard to start the installation process.

If the installations were successful, a message displays in the terminal indicating the same.


Feel free to view the PyCharm installation guide for the required libraries.


Add the following code to the top of each code snippet. This snippet will allow the code in this article to run error-free.

import pandas as pd 
from openpyxl import load_workbook

Saving the Text File

Below are the contents of a text file. This file contains a sample representation of famous Jazz Musicians with fictitious Net Worths. 

First,Last,Birth Year,Birth State,Birth Country,Instrument,Est. Worth
Miles,Davis,1926,Illinois,USA,Trumpet,1343500
Duke,Ellington,1899,Washington,USA,Piano,1911400
John,Coltrane,1926,North  Carolina,USA,Saxophone,2575500
Dizzy,Gilespie,1917,South Carolina,USA,Trumpet,3690300
Ella,Fitzgerald,1917,Virgina,USA,Singer,8275780
Count,Basie,1904,New Jersey,USA,Piano,5145800
Fats,Waller,1904,New York,USA,Piano,3250930
Oscar,Peterson,1925,Montreal,CDA,Piano,7670945
Charlie,Parker,1920,Kansas City,USA,Saxophone,4375395
Billie,Holiday,1915,Philadelphia,USA,Singer,1560454

How to Save Data to CSV in Python?

A text file in its current format cannot be saved directly to Excel. Instead, there is an in-between step: save the text to a CSV file.

To accomplish this, perform the following steps:

  • Highlight all the Jazz Musicians (including the header row) using the mouse.
  • Press the CTRL+C key combination (copy data to clipboard).
  • Create an empty CSV file (use a text editor such as Notepad).
  • With the new CSV file open and active, press the CTRL+V key combination (paste data from clipboard).
  • Save the file as “j-greats.csv” to the current working directory.

How to Convert and Save a CSV to an Excel File in Python?

Before moving ahead, the CSV saved earlier requires conversion to Excel. The steps for this conversion are in the lines of code below.

read_file = pd.read_csv ('j-greats.csv')
read_file.to_excel ('j-greats.xlsx', index=False, header=True)
  • Line [1] reads the contents of the CSV file and saves the contents to read_file.
  • Line [2] passes three parameters to the  to_excel() method.
    • The first parameter is a filename (any-file-name.xlsx) and is required. For this example, the filename is j-greats.xlsx.
    • The second parameter (index=True/False/None) is not required. If this parameter is left out, the default value is None/False. None/True means no additional column on the left with row (index) numbers are added. If True, the column displays.
    • The third parameter (header=True/False) is not required. If this parameter is left out, the header row automatically displays.

If this code runs successfully, the current working directory now contains the j-greats.xlsx file.

Output (depending on parameters passed)

output Line [3] Possible Parameters: index=False/None, header=True

output Line [3] Possible Parameters: index=True, header=False


How to Load an Excel Workbook in Python?

Running the code below provides no visible output. However, the Workbook and Worksheet(s) are now accessible.

file_name = 'j-greats.xlsx'
w_book   = load_workbook(filename=file_name)
w_book.close()
  • Line [1] imports the load_workbook() method from the openpyxl library. This library is required to load the Workbook.
  • Line [2] assigns file_name as a reference to the existing j-greats.xlsx file.
  • Line [3] Loads in the Workbook (includes all associated Worksheets) and assigns the contents to w_book.
  • Line [4] closes the Workbook.

How to Determine the Active Openpyxl Worksheet – Single Worksheet

Each time a Workbook initially loads, the active Worksheet is, by default, assigned as the left-most Worksheet (tab 1).

Upon the creation of j-greats.xlsx, the following transpires:

  • Excel creates a Workbook named j-greats.xlsx.
  • Excel adds a Worksheet for this Workbook, Sheet1.
  • This Worksheet becomes the Active Worksheet.
  • Contents of CSV writes to Worksheet Sheet1.
  • The Workbook was saved and placed, for this example, in the current working directory.

To confirm the Active Worksheet, run the following code.

w_sheets = w_book.active
print(w_sheets.title)
w_book.close()

Output

Sheet1

Summary

In this article, we learned how to:

  • Create CSV file from plain text,
  • Install the required libraries,
  • Convert CSV to Excel,
  • Load a Workbook and,
  • Determine the Active Worksheet.

Next Up

Part 2 in the Working with Excel series will focus on how to:

  • Access Worksheet Data using Cell References
  • Access Worksheet Data via Iteration
  • Add Data to Worksheet
  • Modify Data on Worksheet
  • Delete Data from Worksheet