5 Best Ways to Convert CSV to PDF in Python - Be on the Right Side of Change

💡 Problem Formulation: Python developers often need to transform data from a CSV file into a more presentable PDF format, whether it’s for reporting, sharing data with non-technical stakeholders, or just for better readability. Imagine you have a CSV file containing sales data and you want to create a summarized report in PDF format that can be easily distributed. This article will guide you through five methods to effectively convert CSV files to PDFs with Python.

Method 1: Using Pandas and ReportLab

Pandas is a powerful data manipulation library in Python, and ReportLab is a toolkit for creating PDFs. By combining these two, you can load your CSV data with Pandas, easily manipulate and format it, and then use ReportLab to output a PDF. This method provides robust data processing capabilities alongside flexible PDF generation.

Here’s an example:

import pandas as pd
from reportlab.pdfgen import canvas

data = pd.read_csv('sales_data.csv')
c = canvas.Canvas("sales_report.pdf")

for index, row in data.iterrows():
    c.drawString(72, 720 - index * 10, str(row.tolist()))
    
c.save()

In this example, the PDF ‘sales_report.pdf’ contains the content of ‘sales_data.csv’.

The code snippet uses pd.read_csv() to load the CSV data into a DataFrame. Then, the canvas.Canvas() class from ReportLab is used to create a PDF canvas, onto which the CSV data is drawn row by row with the c.drawString() method, before finally saving the PDF with c.save().

Method 2: Using CSV Module and FPDF

Python’s CSV module can be used for reading and writing CSV files, and the FPDF library allows for PDF generation. This method is straightforward, ideal for those who want to manually control column widths and add custom styling to their PDF output.

Here’s an example:

import csv
from fpdf import FPDF

pdf = FPDF()
pdf.add_page()
pdf.set_font("Arial", size=12)

with open('sales_data.csv', 'r') as file:
    reader = csv.reader(file)
    for row in reader:
        pdf.cell(200, 10, txt=", ".join(row), ln=True)

pdf.output("sales_report.pdf")

The resultant PDF ‘sales_report.pdf’ lists all the rows from the CSV file ‘sales_data.csv’.

After initializing the PDF object and setting up the page and font properties, the script reads the CSV contents row by row using csv.reader() and adds each row to the PDF using pdf.cell(). Finally, pdf.output() is called to generate the PDF.

Method 3: Using PyQt

PyQt is a set of Python bindings for Qt libraries, which can be used to create desktop applications. It can also be used to convert CSV data to PDF by painting on a printer object. This method enables the creation of intricate PDF layouts because you can leverage the Qt layout engine.

Here’s an example:

from PyQt5.QtPrintSupport import QPrinter
from PyQt5.QtWidgets import QApplication, QTextEdit
import sys, csv

app = QApplication(sys.argv)
printer = QPrinter(QPrinter.HighResolution)
printer.setOutputFormat(QPrinter.PdfFormat)
printer.setOutputFileName("sales_report.pdf")

editor = QTextEdit()
text_content = ''
with open('sales_data.csv', 'r') as file:
    reader = csv.reader(file)
    for row in reader:
        text_content += ', '.join(row) + '\n'

editor.setPlainText(text_content)
editor.print_(printer)
app.exec_()

The ‘sales_report.pdf’ file is a PDF version of the ‘sales_data.csv’ contents.

The PyQt5 application initializes, and a printer object is set to output a PDF. A QTextEdit widget is used to set the CSV content as plain text. The editor.print_() method is then called to print the content to the PDF through the printer object.

Method 4: Using tablib and WeasyPrint

Tablib is a module to handle tabular data, which can export data in various formats including PDF when used with WeasyPrint, a visual rendering engine. This approach simplifies exporting CSV data directly to a styled PDF, which can be especially useful for web applications.

Here’s an example:

import tablib

data = tablib.Dataset()
with open('sales_data.csv', 'r') as f:
    data.csv = f.read()

with open('sales_report.html', 'w') as f:
    f.write(data.html)

from weasyprint import HTML
HTML('sales_report.html').write_pdf('sales_report.pdf')

The code generates a PDF file ‘sales_report.pdf’ from the HTML conversion of ‘sales_data.csv’ data.

We use tablib to load the CSV data and then export it to a temporary HTML file with data.html. WeasyPrint then converts this HTML file to a PDF with the HTML().write_pdf() method.

Bonus One-Liner Method 5: Using pandas and pdfkit

For those who prefer a quicker, less code-intensive solution, pdfkit can convert HTML to PDF while pandas can convert CSV to HTML. This one-liner approach essentially pipes CSV through pandas to HTML, then uses pdfkit to make a PDF.

Here’s an example:

import pandas as pd; import pdfkit; pdfkit.from_string(pd.read_csv('sales_data.csv').to_html(), 'sales_report.pdf')

This one-liner command creates the PDF file ‘sales_report.pdf’ from ‘sales_data.csv’ content.

The pandas read_csv() method is combined with to_html() to convert the CSV data directly to an HTML string, which pdfkit then turns into a PDF using from_string().

Summary/Discussion

Method 1: Pandas and ReportLab. Offers great data processing and flexible PDF generation. Requires understanding of both libraries. Can be verbose for simple tasks.
Method 2: CSV Module and FPDF. Straightforward and manual control over PDF. Less automatic styling and formatting. Good for simple customization.
Method 3: PyQt. Leverages powerful GUI library for PDF creation. Allows for complex layouts. Overkill for simple tasks and requires a heavier set of dependencies.
Method 4: tablib and WeasyPrint. Simplifies the process for web applications. Direct export from CSV to styled PDF. Requires installation of WeasyPrint which might have additional dependencies.
Bonus Method 5: Pandas and pdfkit. Quick one-liner code. Relies on correct installation of pdfkit and wkhtmltopdf. Might lack some customization options.