5 Best Ways to Convert Excel to CSV in Python

πŸ’‘ Problem Formulation: Converting Excel files to CSV format is a common task for developers dealing with data interchange between applications. Excel files (.xlsx or .xls) are often used due to their rich features, but the need to process or exchange data in a more minimalist and widely accepted format such as CSV (Comma-Separated Values) arises. For this article, we assume the input is an Excel file named ‘data.xlsx’ with multiple sheets, and the desired output is a series of CSV files representing each sheet.

Method 1: Using pandas DataFrame

This method involves the pandas library, which provides a powerful DataFrame object for data manipulation. The read_excel() function reads the Excel file into a DataFrame, and the to_csv() function then exports the DataFrame to a CSV file. Pandas allow precise control over the CSV output including specifying the delimiter, handling special characters, and managing encoding issues.

Here’s an example:

import pandas as pd

# Load the Excel file
df = pd.read_excel('data.xlsx', sheet_name='Sheet1')

# Save to CSV
df.to_csv('data.csv', index=False)

Output: A CSV file named ‘data.csv’ containing the data from the ‘Sheet1’ of the Excel file.

The code snippet loads the ‘Sheet1’ of an Excel file into a pandas DataFrame and then exports this DataFrame to a CSV file without including row indices. This method is both powerful and flexible, suitable for most needs.

Method 2: Using openpyxl and CSV module

For those without pandas installed, Python’s built-in CSV module in combination with the openpyxl library (for handling .xlsx files) is a robust alternative. This method involves reading an Excel sheet’s rows and writing them to a CSV file using standard Python I/O and the csv.writer function.

Here’s an example:

import csv
from openpyxl import load_workbook

workbook = load_workbook('data.xlsx')
worksheet = workbook.active

with open('data.csv', 'w', newline='') as csvfile:
    csvwriter = csv.writer(csvfile)
    for row in worksheet.iter_rows(values_only=True):
        csvwriter.writerow(row)

Output: A CSV file named ‘data.csv’ containing the data from the active sheet of the Excel file.

The code reads the active sheet from the Excel file and iterates over its rows, writing each row to a CSV file using csv.writer. It’s an excellent choice if pandas is not an option, offers fine-grained control of CSV export but requires more code.

Method 3: Using xlrd and csv

The xlrd library is a tool for reading data and formatting information from Excel files, which in collaboration with Python’s csv module can be used to convert Excel files to CSV. While it offers less functionality compared to pandas and openpyxl, xlrd is a viable option for older Excel formats or when other libraries are not available.

Here’s an example:

import xlrd
import csv

workbook = xlrd.open_workbook('data.xlsx')
sheet = workbook.sheet_by_index(0)

with open('data.csv', 'w', newline='') as csvfile:
    csvwriter = csv.writer(csvfile)
    for row_num in range(sheet.nrows):
        csvwriter.writerow(sheet.row_values(row_num))

Output: A CSV file named ‘data.csv’ containing the data from the first sheet (index 0) of the Excel file.

This script opens an Excel file with xlrd, accesses the first sheet, and iterates over its rows to write them into a CSV file using the csv module. Best suited for simple conversion tasks without the need for advanced features.

Method 4: Using xlwings

xlwings is a BSD-licensed Python library that makes it easy to call Python from Excel and vice versa. It’s particularly useful in converting Excel files to CSV when working on a platform that supports COM, such as Windows.

Here’s an example:

import xlwings as xw

book = xw.Book('data.xlsx')
sheet = book.sheets['Sheet1']
sheet.api.SaveAs('data.csv', FileFormat=6) # FileFormat=6 stands for CSV

Output: A CSV file named ‘data.csv’ containing the data from ‘Sheet1’ of the Excel file.

This code uses xlwings to open an Excel file, selects a specific sheet by name, and saves it as a CSV file. It’s a convenient method for Windows users but is less portable due to its reliance on Microsoft Office’s COM automation.

Bonus One-Liner Method 5: Using pandas with one-liner

Pandas offers a way to convert Excel to CSV in a compact one-liner that can be appealing for quick scripts or command-line usage. It encapsulates everything in a single statement, great for simple conversions with minimal complexity.

Here’s an example:

pd.read_excel('data.xlsx', sheet_name=None).to_csv('data.csv', index=False)

Output: A CSV file named ‘data.csv’ containing the data from all sheets of the Excel file.

This line of code reads an Excel file with all its sheets and immediately converts it to CSV. Since sheet_name=None is used, it will convert all sheets into one CSV. It’s a quick and dirty method when detailed control is not required.

Summary/Discussion

Method 1: Using pandas DataFrame. Most flexible and powerful. Requires pandas installation.
Method 2: Using openpyxl and CSV module. Good control without pandas. Requires more code.
Method 3: Using xlrd and csv. Suitable for older Excel formats or minimal installations.
Method 4: Using xlwings. Convenient for Windows users with Excel. Less portable due to COM dependency.
Bonus Method 5: Pandas one-liner. Great for quick conversions but lacks advanced features.