Generate a Simple PDF using Python ReportLab (9 Steps)

5/5 - (8 votes)

Problem Formulation and Solution Overview

This article shows how to generate a formatted PDF file from a CSV file using the ReportLab and Pandas libraries in conjunction with slicing.

ℹ️ Python offers numerous ways to generate a PDF file. One option is to use the ReportLab library. The article selected the path that used the least amount of code.

To make it more interesting, we have the following running scenario:

🚗 Example Scenario: KarTek, a car dealership in Toronto, previously asked you to import their CSV file into a Database. They would like this data converted to a PDF format to email out to their Sales Staff daily.

To follow along with this article, download and place the cars.csv file into the current working directory:

Download and save the logo file to the current working directory as car_logo.png.


💬 Question: How would we write code to generate a PDF from a CSV file?

We can create a PDF from a CSV file by performing the following steps:

  1. Install ReportLab and Pandas Libraries
  2. Add Library References
  3. Read CSV File
  4. Calculate Total Pages
  5. Get Stylesheet and Template
  6. Create Page Header
  7. Paginate the Data
  8. Build the PDF
  9. Generate the PDF

Step 1: Install ReportLab and Pandas Libraries

Before moving forward, the ReportLab and Pandas libraries must be installed. To install these libraries, run the following code at the command prompt.

The ReportLab library is needed to generate a PDF, and the Pandas library is required to read and manipulate the cars.csv file.

pip install reportlab
pip install pandas

Step 2: Add Library References

To run this code error-free, references to the required modules must be added.

To add these references, navigate to the IDE. Create a file called pdf.py and place this file into the current working directory.

Copy the code snippet below and paste this code into the above-created file.

from reportlab.platypus import *
from reportlab.lib.styles import *
from reportlab.lib import *
from reportlab.platypus import *
import pandas as pd
import math 

An alternate option would be to reference precisely what is needed.

from reportlab.lib.styles import getSampleStyleSheet
from reportlab.platypus import PageBreak
from reportlab.platypus import SimpleDocTemplate, Table, PageBreak, Image, Paragraph, Spacer
from reportlab.lib import colors
import pandas as pd
import math 

The first option is much cleaner. There are pros and cons for each selection. However, the choice is up to you.

Save this file.


Step 3: Read CSV File

The next step is to read the CSV file and extract the heading row.

Copy and paste this code snippet to the bottom of the pdf.py file.

df_cars = pd.read_csv('cars.csv', sep=';').head(60)
df_data = [df_cars.columns[:,].values.astype(str).tolist()] + df_cars.values.tolist()
pg_data = df_data[1:]

The first line in the above code snippet reads in the first 60 rows (head(60)) of the cars.csv file. In this file, the field separator character is a semi-colon (;) and must be specified, as read_csv() assumes the separator character is a comma (,). The results save to df_cars.

The following line casts all data types to strings astype(str) and splits the data into lists. The result saves to df_data. If output to the terminal, the data would display as shown below (a snippet of the file).

[['Car', 'MPG', 'Cylinders', 'Displacement', 'Horsepower', 'Weight', 'Acceleration', 'Model', 'Origin'], 
 ['Chevrolet Chevelle Malibu', 18.0, 8, 307.0, 130.0, 3504.0, 12.0, 70, 'US'], 
 ['Buick Skylark 320', 15.0, 8, 350.0, 165.0, 3693.0, 11.5, 70, 'US'], 
 ['Plymouth Satellite', 18.0, 8, 318.0, 150.0, 3436.0, 11.0, 70, 'US'], ...]

The last line removes the header row using slicing, leaving only the data. The results save to pg_data. If output to the terminal, the data would display as shown below (a snippet of the file).

[['Chevrolet Chevelle Malibu', 18.0, 8, 307.0, 130.0, 3504.0, 12.0, 70, 'US'], ['Buick Skylark 320', 15.0, 8, 350.0, 165.0, 3693.0, 11.5, 70, 'US'], ['Plymouth Satellite', 18.0, 8, 318.0, 150.0, 3436.0, 11.0, 70, 'US'], ['AMC Rebel SST', 16.0, 8, 304.0, 150.0, 3433.0, 12.0, 70, 'US'], ...]

Save this file.

Python List Methods

Step 4: Calculate Total Pages

This code snippet is used to calculate how many pages the PDF will be.

elements = []
recs_pg = 39
tot_pgs = math.ceil(len(df_data) / recs_pg)

The first line in the above code snippet declares an empty list called elements. This list will hold all the page formatting for the PDF.

The following line declares how many records per page will display (not including the header row). The results save to recs_pg.

The last line uses the ceil() function from the math library to calculate how many pages this PDF will be. This function returns an integer value. The results save to tot_pgs. If output to the terminal, the following will display.

2

Save this file.

Python Math Module [Ultimate Guide]

Step 5: Get Stylesheet and Template

This code snippet is used to set the styles and a template for the PDF.

styles = getSampleStyleSheet()
doc = SimpleDocTemplate('inventory.pdf', rightMargin=0, leftMargin=0, topMargin=0, bottomMargin=0)

The first line in the above code snippet gets the getSampleStyleSheet() function. This function gives us access to default styles, such as Title, Heading1, Heading2, etc. Additional styles can be found here.

The following line calls the SimpleDocTemplate() function and passes it five (5) arguments: the filename of the PDF to generate and the page margins. The results save to doc.

Save this file.


Step 6: Create Page Header

This code snippet creates a header for each page of the PDF.

def createPageHeader():
    elements.append(Spacer(1, 10))
    elements.append(Image('car_logo.png', 100, 25))
    elements.append(Paragraph("Inventory", styles['Title']))
    elements.append(Spacer(1, 8))

This first line in the above code snippet declares the function createPageHeader(). The contents of this function will appear at the top of each page.

The following line adds spacing from the top of the page using the Spacer() function and passing it two (2) arguments: the width and height. The results are appended to elements, declared at the beginning of the code.

💡To achieve the offset, the topMargin argument could also be adjusted. However, we wanted to show another way to achieve the same result.

The next line calls the Image() function and passes it three (3) arguments: the image, and the x and y positions, respectively.

The results are appended to elements, declared at the beginning of the code.

Then, a heading of Inventory is called using the Paragraph() function and passing it two (2) arguments: the text and the style for the text. The results are appended to elements, declared at the beginning of the code.

The last line appends another Spacer().

Save this file.


Step 7: Paginate the Data

This code snippet shows how the data for each page is paginated.

def paginateInventory(start, stop):
    tbl = Table(df_data[0:1] + pg_data[start:stop])  
    tbl.setStyle(TableStyle([('BACKGROUND', (0, 0), (-1, 0), '#F5F5F5'),
                              ('FONTSIZE', (0, 0), (-1, 0), 8),
                              ('GRID', (0, 0), (-1, -1), .5, '#a7a5a5')])) 
    elements.append(tbl)

The line in the above code snippet declares the function paginateInventory() which accepts two (2) arguments: a start and stop position.

The following line creates the data for the page. The Table() function is called and passed one (1) argument: the header row from df_data (df_data[0:1]), plus the data for the specified page (pg_data[start:stop]). These results save to tbl.

The next line sets out the format for the table on the page. This line does the following:

  • Change the background color of the header row.
  • Changes the font size.
  • Sets up the grid lines between all the cols/rows,

The tbl is then appended to elements, declared at the beginning of the code.

Save this file.

The Ultimate Guide to Slicing in Python

Step 8: Build the PDF

This code snippet creates a function to loop through the paginated data and builds a PDF.

def generatePDF():
    cur_pg = 0
    start_pos = 0
    stop_pos = recs_pg

    for cur_pg in range(tot_pgs):
        createPageHeader()
        paginateInventory(start_pos, stop_pos)
        elements.append(PageBreak())
        start_pos += recs_pg
        stop_pos += recs_pg
    doc.build(elements)

The first line in the above code snippet declares the function generatePDF().

The following three (3) lines declare three (3) variables and their initial positions.

  • cur_pg, which keeps track of the page we are currently on.
  • start_pos, which is where the data initially starts.
  • stop_pos, which is where the data initially ends (recs_pg, or 39).

The following section does all the work! It declares a for loop, which loops through all pre-determined pages (tot_pgs, or 2 in this case). Then, the following creates a page by:

  • Executing the createPageHeader() function.
  • Outputting the page data using slicing (paginateInventory(start_pos, stop_pos)).
  • Adds a Page Break (PageBreak()).
  • Re-calculate the start_pos and stop_pos variables.

This loop continues until all pages have been generated,

The last line in this code snippet, builds the PDF.

Save this file.


Step 9: Generate a PDF

This code snippet generates the PDF built earlier.

If you ran the above code, no PDF file would be generated. This is because the following code needs to be appended to the end of the pdf.py file.

The first line is mainly used to declare the top-level starting point of a program.

if __name__ == '__main__':
    generatePDF()

An alternative is to just call the function.

generatePDF()

There are pros and cons to both options. However, the choice is up to you.

Save and run this file.

If successful, the inventory.pdf file will be saved to the current working directory.


The Full Code

from reportlab.platypus import *
from reportlab.lib.styles import *
from reportlab.lib import *
from reportlab.platypus import *
import pandas as pd
import math 

df_cars = pd.read_csv('cars.csv', sep=';').head(60)
df_data = [df_cars.columns[:,].values.astype(str).tolist()] + df_cars.values.tolist()
pg_data = df_data[1:]

elements = []
recs_pg = 39
tot_pgs = math.ceil(len(df_data) / recs_pg)

styles = getSampleStyleSheet()
doc = SimpleDocTemplate('inventory.pdf', rightMargin=0, leftMargin=0, topMargin=0, bottomMargin=0)

def createPageHeader():
    elements.append(Spacer(1, 10))
    elements.append(Image('car_logo.png', 100, 25))
    elements.append(Paragraph("Inventory", styles['Title']))
    elements.append(Spacer(1, 8))

def paginateInventory(start, stop):
    tbl = Table(df_data[0:1] + pg_data[start:stop])  
    tbl.setStyle(TableStyle([('BACKGROUND', (0, 0), (-1, 0), '#F5F5F5'),
                              ('FONTSIZE', (0, 0), (-1, 0), 8),
                              ('GRID', (0, 0), (-1, -1), .5, '#a7a5a5')])) 
    elements.append(tbl)

def generatePDF():
    cur_pg = 0
    start_pos = 0
    stop_pos = recs_pg

    for cur_pg in range(tot_pgs):
        createPageHeader()
        paginateInventory(start_pos, stop_pos)
        elements.append(PageBreak())
        start_pos += recs_pg
        stop_pos += recs_pg
    doc.build(elements)
generatePDF()

Summary

This article has shown you a compact way to generate a customized PDF file.

Good Luck & Happy Coding!