Converting HTML to PDF is a common requirement for software developers. One might need to generate reports, invoices, or other documents from web content. The challenge is to transform an HTML string, which defines the structure and presentation of a web page, into a PDF document, which is a fixed-format and portable file. For instance, converting the HTML of an invoice template into a downloadable PDF invoice.
Method 1: Using WeasyPrint
WeasyPrint is a visual rendering engine for HTML and CSS that can output to PDF. It is designed for web developers who desire to generate high-quality printable documents. The library supports full CSS. It is written in Python, which makes integration into existing Python applications straightforward.
Here’s an example:
from weasyprint import HTML html_string = '<h1>Hello, WeasyPrint!</h1>' HTML(string=html_string).write_pdf('output.pdf')
Output: A PDF document with the content “Hello, WeasyPrint!” as a header.
This code snippet creates an HTML object from the string variable and then generates a PDF from that object, storing it as ‘output.pdf’. WeasyPrint handles the conversion internally, abstracting the complexities of the PDF format from the user.
Method 2: Using pdfkit
pdfkit is a Python wrapper for the wkhtmltopdf tool which renders HTML into PDF and various image formats using the QT Webkit rendering engine. This means that pdfkit essentially encases wkhtmltopdf functionalities making it accessible through Python.
Here’s an example:
import pdfkit html_string = '<h1>Welcome to pdfkit!</h1>' pdfkit.from_string(html_string, 'output.pdf')
Output: A PDF file named ‘output.pdf’ containing a centered heading “Welcome to pdfkit!”.
This code snippet uses pdfkit’s from_string
function, which takes the HTML content and the output file path as arguments to generate a PDF file. It’s a quick way to render HTML content to a static PDF file.
Method 3: Using ReportLab
ReportLab is a robust and mature library that helps in generating complex PDFs from Python. It can build precise custom layouts for PDFs, but it requires manual management of the document’s structure and styling.
Here’s an example:
from reportlab.pdfgen import canvas c = canvas.Canvas('output.pdf') c.drawString(72, 800, "Hello, ReportLab!") c.save()
Output: A PDF with the text “Hello, ReportLab!” starting 72 points from the left and 800 points from the bottom of the page.
This code snippet creates a canvas object on which we manually draw text. The coordinates are specified in points. After adding all the elements, we save the canvas to output the PDF.
Method 4: Using PyPDF2
PyPDF2 is a library that can read, split, merge, crop, and transform the pages of PDF files. It can also add custom information to PDFs and is useful for manipulating existing PDFs, rather than generating new ones from HTML strings.
Here’s an example:
from PyPDF2 import PdfFileWriter, PdfFileReader # Creating PDF writer and reader objects writer = PdfFileWriter() reader = PdfFileReader('source.pdf') # Add all pages from the source PDF to the writer object for page_num in range(reader.numPages): writer.addPage(reader.getPage(page_num)) # Add custom text (not HTML) page = writer.getPage(0) page.insertText('Hello, PyPDF2!') # Save the new PDF with open('output.pdf', 'wb') as out_file: writer.write(out_file)
Output: A modified PDF file with an additional text “Hello, PyPDF2!” on the first page.
This example shows how to manipulate an existing PDF file adding custom text using PyPDF2. This method isn’t direct for HTML to PDF conversion but is included for its usefulness in PDF manipulations.
Bonus One-Liner Method 5: Using xhtml2pdf
xhtml2pdf enables users to convert HTML/CSS content to PDF in Python easily. It is a standalone command-line tool and can be used as a Python library.
Here’s an example:
import xhtml2pdf.pisa as pisa html_string = '<p>Quick conversion with xhtml2pdf!</p>' pisa.CreatePDF(html_string, dest=open('output.pdf', 'wb'))
Output: A PDF file containing “Quick conversion with xhtml2pdf!” as a paragraph.
This code snippet demonstrates using xhtml2pdf for quick and straightforward HTML to PDF conversion. The destination file is opened in binary write mode, and the PDF is generated directly from the given HTML string.
Summary/Discussion
- Method 1: WeasyPrint. High-quality output. Supports advanced CSS. Can be resource-intensive for complex layouts.
- Method 2: pdfkit. Simple API. Depends on wkhtmltopdf. Requires additional installation of wkhtmltopdf.
- Method 3: ReportLab. Full control over PDF creation. Requires manual layout management. Great for very custom PDF generation.
- Method 4: PyPDF2. Useful for existing PDF manipulation. Not suitable for direct HTML-to-PDF conversion. Good for adding text to PDFs.
- Method 5: xhtml2pdf. Quick and easy. Handles basic HTML/CSS conversions well. May struggle with more complex structures and layouts.