5 Best Ways to Convert CSV File to PDF File Using Python

πŸ’‘ Problem Formulation: Converting data from a CSV file to a PDF is necessary when you need a formatted and portable document. For instance, you might have financial records in a CSV file that you need to present as a report in PDF form. Our goal is to automate this process using Python, transforming a CSV containing rows of data into a cleanly formatted PDF document.

Method 1: Using Pandas and ReportLab

Utilizing Pandas for data manipulation and ReportLab for PDF generation is a robust method. Pandas read CSV files effortlessly and ReportLab allows for intricate PDF creations. This method is especially useful when dealing with large datasets and requires extensive formatting in the resulting PDF.

Here’s an example:

import pandas as pd
from reportlab.pdfgen import canvas

# Read CSV file
data_frame = pd.read_csv('data.csv')

# Create a PDF object
c = canvas.Canvas("data.pdf")

# Insert your PDF generation logic here
# For example, simple text output for each row
for index, row in data_frame.iterrows():
    c.drawString(72, 800 - 15 * index, str(row.values))

# Save the PDF
c.save()

Output: The code will generate “data.pdf” with textual representation of CSV rows.

This code snippet demonstrates the simplicity of traversing a Pandas DataFrame and outputting its rows onto a PDF file using ReportLab’s canvas drawing tools.

Method 2: Using Pandas and Matplotlib

This method combines Pandas for data reading and Matplotlib for plotting, which is then exported as PDF. Ideal for visual data representation in PDF form, such as generating charts from CSV data, which are rendered directly to PDF.

Here’s an example:

import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.backends.backend_pdf import PdfPages

# Read data from CSV
df = pd.read_csv('data.csv')

# Create a PDF pages object
pdf_pages = PdfPages('data_plots.pdf')

# Plotting data
plt.figure()
df.plot(kind="bar")
plt.title('Bar Chart')

# Save figure to PDF
pdf_pages.savefig()

# Close the PDF object
pdf_pages.close()

Output: The code will generate “data_plots.pdf” with a bar chart

This snippet creates a visual representation of CSV data, showcasing how Matplotlib’s PDF backend easily integrates into a data pipeline, transforming a dataset into a sharable document.

Method 3: Using CSV and FPDF

The Python library FPDF is a minimalist approach for PDF creation, which, when combined with the CSV module, is perfect for quick and straightforward conversions. Best used when the PDF layout is relatively simple and does not require advanced formatting or charts.

Here’s an example:

import csv
from fpdf import FPDF

# Create instance of FPDF class
pdf = FPDF()

# Add a page
pdf.add_page()

# Set font
pdf.set_font("Arial", size = 12)

with open('data.csv', 'r') as file:
    reader = csv.reader(file)
    for row in reader:
        pdf.cell(200, 10, txt = ", ".join(row), ln = True)

# Save the pdf with name .pdf
pdf.output("data.pdf") 

Output: The code will generate “data.pdf” with each CSV row as a line in the PDF.

In this example, FPDF’s simplicity shines through, converting the CSV data row-by-row into a neat and tidy PDF, making it a dependable option for straightforward tasks.

Method 4: Using Tabula

Tabula is a library that is typically used to extract tables from PDFs. However, its ability to represent tables in PDFs can also be leveraged for CSV to PDF conversion. This method excels in scenarios where tabular data preservation is critical during conversion.

Here’s an example:

from tabula import convert_into

# Read the CSV and convert
convert_into("data.csv", "data.pdf", output_format="pdf")

Output: This will create a “data.pdf” with the CSV data shown in table format.

While Tabula is traditionally associated with PDF reading, this snippet sows how it can be a tool for simple and effective table-based PDF creation directly from CSV inputs.

Bonus One-Liner Method 5: Using DataFrame.to_html() and WeasyPrint

A concise one-liner utilizing the DataFrame.to_html() and WeasyPrint packages quickly turns a CSV into a styled PDF. This approach is great for those who prefer a pipeline involving minimal code, provided the default conversion suits their needs.

Here’s an example:

import pandas as pd
from weasyprint import HTML

# Load data
data = pd.read_csv('data.csv')

# Convert to HTML then to PDF in one line
HTML(string=data.to_html()).write_pdf('data.pdf')

Output: This will result in a “data.pdf” with HTML-like table formatting.

A testament to Python’s often-celebrated ‘batteries-included’ nature, this method compresses what could be several steps into a singular, elegant line of code.

Summary/Discussion

  • Method 1: Pandas and ReportLab. Strengths: Highly customizable, great for extensive formatting. Weaknesses: Can become complex with advanced setups.
  • Method 2: Pandas and Matplotlib. Strengths: Ideal for visual representations, directly plots to PDF. Weaknesses: Primarily suited for data that is best represented graphically.
  • Method 3: CSV and FPDF. Strengths: Simple and straightforward, no-frills conversion. Weaknesses: Basic, lacks advanced formatting options.
  • Method 4: Tabula. Strengths: Perfect for table preservation, uses familiar tools for many. Weaknesses: Less flexibility for non-tabular layouts.
  • Bonus Method 5: DataFrame.to_html() and WeasyPrint. Strengths: Extremely concise, very quick for simple conversions. Weaknesses: Limited styling options, may not suit complex layouts.