Python provides robust tools for handling Excel files, and this article will guide you through the process of parsing Excel sheets into different formats using Python.
To parse Excel sheets into text files and lists in Python, use the pandas library. Install pandas and openpyxl, then read the Excel file with pd.read_excel()
. Convert the DataFrame to a text file using to_csv()
or to a two-dimensional list with values.tolist()
. For specific columns, use either iloc
with column indices for unnamed sheets or usecols
with column letters. This method efficiently transforms Excel data into versatile Python structures, enabling customized data manipulation.
Let’s dive into more details next! π
How to Parse an Excel Sheet into a Text File Using Python?
To parse an Excel sheet into a text file using Python, you need the pandas
library. Here’s a step-by-step process:
- Install necessary libraries: Use
pip
to installpandas
andopenpyxl
. - Read the Excel file: Use
pandas
to load the Excel file. - Write to a text file: Convert the data to a text format and write it to a text file.
Here’s the sample code:
import pandas as pd # Load the Excel file excel_file = 'your_file.xlsx' # Replace with your Excel file path sheet_name = 'Sheet1' # Replace with your sheet name df = pd.read_excel(excel_file, sheet_name=sheet_name) # Convert the dataframe to text (string) text_data = df.to_csv(sep='\t', index=False) # Write to a text file txt_file = 'your_file.txt' # Replace with your text file name with open(txt_file, 'w') as file: file.write(text_data)
This code reads an Excel file and writes its content into a tab-separated text file. Replace 'your_file.xlsx'
, 'Sheet1'
, and 'your_file.txt'
with the actual file names and sheet name.
How to Store Excel Data in a Two-Dimensional List?
If you want to store Excel data in a 2D list where each sublist represents a row, modify the code like this:
import pandas as pd # Load the Excel file excel_file = 'your_file.xlsx' # Replace with your file path sheet_name = 'Sheet1' # Replace with your sheet name df = pd.read_excel(excel_file, sheet_name=sheet_name) # Convert DataFrame to 2D list data_2d_list = df.values.tolist() # Output the 2D list for row in data_2d_list: print(row)
This code converts the DataFrame into a list of lists, where each row becomes a sublist.
Selecting Columns by Index in an Unnamed Excel Sheet
To select columns by their index in an unnamed Excel sheet, use pandas’ integer-location based indexing. Here’s how:
import pandas as pd # Load the Excel file excel_file = 'your_file.xlsx' # Replace with your file path sheet_name = 'Sheet1' # Replace with your sheet name df = pd.read_excel(excel_file, sheet_name=sheet_name, header=None) # Select columns by index - e.g., first and second columns df_selected_columns = df.iloc[:, [0, 1]] # Convert to 2D list and output data_2d_list = df_selected_columns.values.tolist() for row in data_2d_list: print(row)
.iloc
is used here to select columns based on their integer location.
Selecting Specific Columns by Excel Column Letters
To select specific columns by their Excel letter identifiers, use the usecols
parameter:
import pandas as pd # Load the Excel file with specific columns excel_file = 'your_file.xlsx' # Replace with your file path sheet_name = 'Sheet1' # Replace with your sheet name cols_to_use = ['A', 'C'] # Replace with the columns you need df = pd.read_excel(excel_file, sheet_name=sheet_name, usecols=cols_to_use, header=None) # Convert to 2D list and output data_2d_list = df.values.tolist() for row in data_2d_list: print(row)
In this code, usecols=cols_to_use
instructs pd.read_excel()
to read only the specified columns. Adjust cols_to_use
as needed.