Parsing Excel Sheets into Text Files and Lists with Python

Python provides robust tools for handling Excel files, and this article will guide you through the process of parsing Excel sheets into different formats using Python.

To parse Excel sheets into text files and lists in Python, use the pandas library. Install pandas and openpyxl, then read the Excel file with pd.read_excel(). Convert the DataFrame to a text file using to_csv() or to a two-dimensional list with values.tolist(). For specific columns, use either iloc with column indices for unnamed sheets or usecols with column letters. This method efficiently transforms Excel data into versatile Python structures, enabling customized data manipulation.

Let’s dive into more details next! πŸ‘‡

How to Parse an Excel Sheet into a Text File Using Python?

To parse an Excel sheet into a text file using Python, you need the pandas library. Here’s a step-by-step process:

  1. Install necessary libraries: Use pip to install pandas and openpyxl.
  2. Read the Excel file: Use pandas to load the Excel file.
  3. Write to a text file: Convert the data to a text format and write it to a text file.

Here’s the sample code:

import pandas as pd

# Load the Excel file
excel_file = 'your_file.xlsx'  # Replace with your Excel file path
sheet_name = 'Sheet1'  # Replace with your sheet name
df = pd.read_excel(excel_file, sheet_name=sheet_name)

# Convert the dataframe to text (string)
text_data = df.to_csv(sep='\t', index=False)

# Write to a text file
txt_file = 'your_file.txt'  # Replace with your text file name
with open(txt_file, 'w') as file:
    file.write(text_data)

This code reads an Excel file and writes its content into a tab-separated text file. Replace 'your_file.xlsx', 'Sheet1', and 'your_file.txt' with the actual file names and sheet name.

How to Store Excel Data in a Two-Dimensional List?

If you want to store Excel data in a 2D list where each sublist represents a row, modify the code like this:

import pandas as pd

# Load the Excel file
excel_file = 'your_file.xlsx'  # Replace with your file path
sheet_name = 'Sheet1'  # Replace with your sheet name
df = pd.read_excel(excel_file, sheet_name=sheet_name)

# Convert DataFrame to 2D list
data_2d_list = df.values.tolist()

# Output the 2D list
for row in data_2d_list:
    print(row)

This code converts the DataFrame into a list of lists, where each row becomes a sublist.

Selecting Columns by Index in an Unnamed Excel Sheet

To select columns by their index in an unnamed Excel sheet, use pandas’ integer-location based indexing. Here’s how:

import pandas as pd

# Load the Excel file
excel_file = 'your_file.xlsx'  # Replace with your file path
sheet_name = 'Sheet1'  # Replace with your sheet name
df = pd.read_excel(excel_file, sheet_name=sheet_name, header=None)

# Select columns by index - e.g., first and second columns
df_selected_columns = df.iloc[:, [0, 1]]

# Convert to 2D list and output
data_2d_list = df_selected_columns.values.tolist()
for row in data_2d_list:
    print(row)

.iloc is used here to select columns based on their integer location.

Selecting Specific Columns by Excel Column Letters

To select specific columns by their Excel letter identifiers, use the usecols parameter:

import pandas as pd

# Load the Excel file with specific columns
excel_file = 'your_file.xlsx'  # Replace with your file path
sheet_name = 'Sheet1'  # Replace with your sheet name
cols_to_use = ['A', 'C']  # Replace with the columns you need

df = pd.read_excel(excel_file, sheet_name=sheet_name, usecols=cols_to_use, header=None)

# Convert to 2D list and output
data_2d_list = df.values.tolist()
for row in data_2d_list:
    print(row)

In this code, usecols=cols_to_use instructs pd.read_excel() to read only the specified columns. Adjust cols_to_use as needed.