5 Best Ways to Export to CSV with Leading Zeros in Python

πŸ’‘ Problem Formulation: When exporting data to a CSV file using Python, it’s common to lose leading zeros in fields such as ZIP codes or ID numbers. For example, the ID ‘00123’ may be exported as ‘123’, which could lead to data integrity issues. This article discusses methods to retain these leading zeros in CSV exports.

Method 1: Using the csv Module with String Formatting

One way to preserve leading zeros when exporting to a CSV is by converting integers to strings with leading zeros in the format method. The csv module’s writer object can then handle the formatted strings properly.

Here’s an example:

import csv

data = [[1, '00456'], [2, '07891']]
with open('output.csv', 'w', newline='') as file:
    writer = csv.writer(file)
    for row in data:
        writer.writerow([f'{row[0]:02}', row[1]])

Output CSV content:

01,00456
02,07891

In the code snippet, the list data contains rows with ID numbers that need to retain leading zeros. We format the first column with f'{row[0]:02}', which adds leading zeros for a total width of two characters. The csv.writer processes each row and exports to ‘output.csv’ with the leading zeros intact.

Method 2: Using pandas DataFrame with String Conversion

The pandas library provides a versatile DataFrame object, which can be used to ensure that data with leading zeros is treated as a string data type during CSV export. This keeps the leading zeros visible in the output file.

Here’s an example:

import pandas as pd

df = pd.DataFrame({'ID': ['00123', '04567'], 'Name': ['Alice', 'Bob']})
df.to_csv('output_with_pandas.csv', index=False)

Output CSV content:

ID,Name
00123,Alice
04567,Bob

The pandas.DataFrame is created with columns ‘ID’ and ‘Name’, deliberately treating ‘ID’ values as strings. When using df.to_csv(), the CSV is exported so that IDs are in string format, preserving the leading zeros.

Method 3: Using Excel-Compatible Formatting in pandas

Excel may automatically strip leading zeros when opening CSV files. To prevent this, pandas can be used to export data with additional formatting that forces Excel to interpret values as strings.

Here’s an example:

import pandas as pd

df = pd.DataFrame({'ID': ['00123', '04567'], 'Name': ['Alice', 'Bob']})
df.to_csv('output_excel_compatible.csv', index=False, quoting=csv.QUOTE_NONNUMERIC)

Output CSV content:

"ID","Name"
"00123","Alice"
"04567","Bob"

The parameter quoting=csv.QUOTE_NONNUMERIC in df.to_csv() ensures all non-numeric fields are enclosed in quotes, signaling Excel to treat these fields as text and keep the leading zeros.

Method 4: Pre-formatting Data Before Export

Manually pre-formatting the data as strings within Python before writing to a CSV file can keep the leading zeros. This requires converting numbers to strings with the desired zero-padding ahead of time.

Here’s an example:

import csv

data = [[12, 123], [5, 67]]
formatted_data = [[f'{id:05}', f'{num:08}'] for id, num in data]

with open('preformatted_output.csv', 'w', newline='') as file:
    writer = csv.writer(file)
    writer.writerows(formatted_data)

Output CSV content:

00012,00000123
00005,00000067

The list comprehension creates a new list, formatted_data, where each numeric value is formatted with leading zeros to a fixed width. The resulting strings are then written to the CSV file without losing any formatting.

Bonus One-Liner Method 5: Using zfill()

The zfill() method in Python adds leading zeros to a string until it reaches the specified length. This one-liner can quickly ensure numbers are exported with leading zeros.

Here’s an example:

import csv

data = [123, 45]
with open('one_liner_output.csv', 'w', newline='') as file:
    writer = csv.writer(file)
    writer.writerows([[str(num).zfill(5)] for num in data])

Output CSV content:

00123
00045

Each number in data is converted to a string and then zfill(5) is used to pad the string with zeros up to a length of five. This transformation is performed inline as the CSV rows are written.

Summary/Discussion

  • Method 1: Using the csv Module with String Formatting. Strengths: Built into Python, no external libraries needed. Weaknesses: Requires manual formatting for each field.
  • Method 2: Using pandas DataFrame with String Conversion. Strengths: Simple and concise with a powerful data handling library. Weaknesses: Adds external library dependency.
  • Method 3: Using Excel-Compatible Formatting in pandas. Strengths: Compatible with Excel’s auto-formatting. Weaknesses: Specific to Excel and thus may not be ideal for all CSV applications.
  • Method 4: Pre-formatting Data Before Export. Strengths: Gives complete control over formatting. Weaknesses: Can be cumbersome for large datasets.
  • Method 5: One-Liner Using zfill(). Strengths: Quick and simple for individual values. Weaknesses: Less flexible for complex formatting requirements.