5 Best Ways to Convert Python CSV to YAML

πŸ’‘ Problem Formulation: Converting data from CSV (Comma Separated Values) to YAML (YAML Ain’t Markup Language) is a common task for Python developers who need a more human-readable format for configuration files or data serialization. The input is a CSV file with structured data and the desired output is an equally structured YAML file.

Method 1: Using the csv and yaml Modules

This method involves reading the CSV file using Python’s built-in csv module and then converting the data to a Python dictionary that can be easily dumped into a YAML format with the help of the yaml module. This approach is straightforward and mostly relies on the Python standard library.

Here’s an example:

import csv
import yaml

with open('data.csv', 'r') as csvfile:
    reader = csv.DictReader(csvfile)
    data = [row for row in reader]

with open('data.yaml', 'w') as yamlfile:
    yaml.dump(data, yamlfile, default_flow_style=False)

Output:

- {header1: value1_1, header2: value1_2}
- {header1: value2_1, header2: value2_2}

The code snippet reads a CSV file, converts each row to a dictionary, and appends it to a list. Then, the list of dictionaries is written to a YAML file, with default_flow_style=False to output in the preferred YAML block format.

Method 2: Using pandas and ruamel.yaml

With pandas for data manipulation and ruamel.yaml for outputting YAML, this method provides a powerful combination for dealing with more complex CSV files and generating highly customizable YAML output.

Here’s an example:

import pandas as pd
from ruamel.yaml import YAML

df = pd.read_csv('data.csv')
data = df.to_dict(orient='records')

yaml = YAML()
with open('data.yaml', 'w') as yamlfile:
    yaml.dump(data, yamlfile)

Output:

- header1: value1_1
  header2: value1_2
- header1: value2_1
  header2: value2_2

The code snippet uses pandas to read the CSV and convert it into a list of dictionaries, one per row. The ruamel.yaml then takes this list and writes it to a YAML file, offering advanced customization options and refined control over the output format if needed.

Method 3: Custom Conversion Function

This method includes writing a custom function that parses the CSV data manually, potentially providing the most control over the conversion process and allowing for custom YAML structures.

Here’s an example:

import yaml

def csv_to_yaml(csv_text):
    lines = csv_text.splitlines()
    headers = lines[0].split(',')
    yaml_list = []
    for line in lines[1:]:
        values = line.split(',')
        yaml_list.append(dict(zip(headers, values)))
    return yaml.dump(yaml_list, default_flow_style=False)

csv_data = "header1,header2\nvalue1_1,value1_2\nvalue2_1,value2_2"
yaml_output = csv_to_yaml(csv_data)
print(yaml_output)

Output:

- header1: value1_1
  header2: value1_2
- header1: value2_1
  header2: value2_2

The code defines a csv_to_yaml function that takes a CSV string, splits it into lines and headers, and then creates a list of dictionaries that represent the CSV rows. The YAML output is generated using the yaml.dump method.

Method 4: Command-Line Conversion with csvkit and PyYAML

For those preferring to work directly from the command-line, csvkit, a suite of utilities for converting to and from CSV, along with the PyYAML library, can be used to achieve CSV-to-YAML conversion with a simple shell one-liner.

Here’s an example:

csvjson data.csv | python -c 'import sys, yaml; yaml.safe_dump(list(sys.stdin), sys.stdout, default_flow_style=False)'

Output:

- "header1,header2"
- "value1_1,value1_2"
- "value2_1,value2_2"

This one-liner first converts the CSV file to JSON using csvjson, then pipes the JSON data to a Python command that loads and converts it to a YAML format, ensuring it’s a safe and compatible YAML document.

Bonus One-Liner Method 5: Using xsv and yq

xsv is a fast CSV command-line toolkit, and yq is a lightweight and portable command-line YAML processor. This method combines their strengths for a quick and efficient conversion.

Here’s an example:

xsv table data.csv | yq -o=yaml

Output:

- header1: value1_1
  header2: value1_2
- header1: value2_1
  header2: value2_2

This efficient command first uses xsv to read and build a human-readable table from CSV, then pipes it to yq, which outputs YAML. It’s a concise solution that utilizes specialized tools optimized for their tasks.

Summary/Discussion

  • Method 1: Using the csv and yaml Modules. Strengths: Simple, uses standard libraries. Weaknesses: Basic functionality with limited customization.
  • Method 2: Using pandas and ruamel.yaml. Strengths: Powerful, good for large or complex CSV files, customizable output. Weaknesses: Requires third-party libraries.
  • Method 3: Custom Conversion Function. Strengths: Highly customizable, great for tailored solutions. Weaknesses: Requires more code and manual effort.
  • Method 4: Command-Line Conversion with csvkit and PyYAML. Strengths: Quick one-liner suitable for shell scripting. Weaknesses: Depends on extra command-line utilities not in the standard Python distribution.
  • Method 5: Using xsv and yq. Strengths: Fast and efficient, suitable for those who prefer command-line tools. Weaknesses: Requires installation of command-line utilities that might not be available on all systems.