How to Compress PDF Files Using Python?

Problem Formulation

Suppose you have a PDF file, but it’s too large and you’d like to compress it (perhaps you want to reduce its size to allow for faster transfer over the internet, or perhaps to save storage space). 

Even more challenging, suppose you have multiple PDF files you’d like to compress. 

Multiple online options exist, but these typically allow a limited number of files to be processed at a time.  Also of course there is the extra time involved in uploading the originals, then downloading the results.  And of course, perhaps you are not comfortable sharing your files with the internet.

Fortunately, we can use Python to address all these concerns.  But before we learn how to do this, let’s first learn a little bit about PDF files.

About Compressing PDF Files

According to Dov Isaacs, former Adobe Principal Scientist (see his discussion here) PDF documents are already substantially compressed. 

The text and vector graphics portions of the documents are already internally zip-compressed, so there is little opportunity for improvement there. 

Instead, any file compression improvements are achieved through compression of image portions of PDF documents, along with potential loss of image quality. 

So compression might be achievable, but the user must choose between how much compression versus how much image quality loss is acceptable.

Setup

A programmer going by the handle Theeko74 has written a Python script called β€œpdf_compressor.py”. This script is a wrapper for ghostscript functions that do the actual work of compressing PDF files. 

This script is offered under the MIT license and is free to use as the user wishes.

πŸ’‘ Hint: make sure you have ghostscript installed on your computer. To install ghostscript, follow this detailed guide and come back afterward.

Now download pdf_compressor.py from GitHub here.

Ultimately we will be writing a Python script to perform the compression. 

So we create a directory to hold the script, and use our preferred editor or IDE to create it (this example uses Linux command line to make the directory, and uses vim as the editor to make script β€œbpdfc.py”; use your preferred choice for creating the directory and creating the script within it):

$ mkdir batchPDFcomp
$ cd batchPDFcomp
$ vim bpdfc.py

We won’t write out the script just yet – we’ll show some details for the script a little later in this article.

When we do write the script, within it we’ll import β€œpdf_compressor.py” as a module

To prepare for this we should create a subdirectory below our Python script directory. 

Also, we’ll need to copy pdf_compressor.py into that subdirectory, and we’ll need to create a file __init__.py within the same subdirectory (those are double underscores each side of β€˜init’):

$ mkdir pdfc
$ cp ~/Downloads/pdf_compressor.py ~/batchPDFcomp/pdfc/
$ cd pdfc
$ vim __init__.py

What we have done here is created a local package pdfc containing a module pdf_compressor.py

πŸ’‘ Note: The presence of file __init__.py indicates to Python that that directory is part of a package, and to look there for modules.

Now we are ready to write our script.

The PDF Compression Python Script

Here is our script:

from pdfc.pdf_compressor import compress
compress('Finxter_WorldsMostDensePythonCheatSheet.pdf', 'Finxter_WorldsMostDensePythonCheatSheet_compr.pdf', power=4)

As you can see it’s a very short script. 

First we import the β€œcompress” function from β€œpdf_compressor” module. 

Then we call the β€œcompress” function.  The function takes as arguments: the input file path, the output file path, and a β€˜power’ argument that sets compression as follows, from least compression to most (according to the documentation in the script):

Compression levels:

  • 0: default
  • 1: prepress
  • 2: printer
  • 3: ebook
  • 4: screen

Running the Script

Now we can run our script:

$  python bpdfc.py
Compress PDF...
Compression by 51%.
Final file size is 0.2MB
Done.
$ 

We have only compressed one PDF document in this example, but by modifying the script to loop through multiple PDF documents one can compress multiple files at once. 

However, we leave that as an exercise for the reader!

We hope you have found this article useful. Thank you for reading, and we wish you happy coding!

πŸ‘‰ Recommended Tutorial: How to Compress Images in Python