Problem Formulation
Suppose you have a PDF file, but itβs too large and youβd like to compress it (perhaps you want to reduce its size to allow for faster transfer over the internet, or perhaps to save storage space).
Even more challenging, suppose you have multiple PDF files youβd like to compress.
Multiple online options exist, but these typically allow a limited number of files to be processed at a time. Also of course there is the extra time involved in uploading the originals, then downloading the results. And of course, perhaps you are not comfortable sharing your files with the internet.
Fortunately, we can use Python to address all these concerns. But before we learn how to do this, letβs first learn a little bit about PDF files.
About Compressing PDF Files
According to Dov Isaacs, former Adobe Principal Scientist (see his discussion here) PDF documents are already substantially compressed.
The text and vector graphics portions of the documents are already internally zip-compressed, so there is little opportunity for improvement there.
Instead, any file compression improvements are achieved through compression of image portions of PDF documents, along with potential loss of image quality.
So compression might be achievable, but the user must choose between how much compression versus how much image quality loss is acceptable.
Setup
A programmer going by the handle Theeko74 has written a Python script called βpdf_compressor.py
β. This script is a wrapper for ghostscript
functions that do the actual work of compressing PDF files.
This script is offered under the MIT license and is free to use as the user wishes.
π‘ Hint: make sure you have ghostscript
installed on your computer. To install ghostscript
, follow this detailed guide and come back afterward.
Now download pdf_compressor.py
from GitHub here.
Ultimately we will be writing a Python script to perform the compression.
So we create a directory to hold the script, and use our preferred editor or IDE to create it (this example uses Linux command line to make the directory, and uses vim
as the editor to make script βbpdfc.py
β; use your preferred choice for creating the directory and creating the script within it):
$ mkdir batchPDFcomp $ cd batchPDFcomp $ vim bpdfc.py
We wonβt write out the script just yet – weβll show some details for the script a little later in this article.
When we do write the script, within it weβll import βpdf_compressor.py
β as a module.
To prepare for this we should create a subdirectory below our Python script directory.
Also, weβll need to copy pdf_compressor.py
into that subdirectory, and weβll need to create a file __init__.py
within the same subdirectory (those are double underscores each side of βinit
β):
$ mkdir pdfc $ cp ~/Downloads/pdf_compressor.py ~/batchPDFcomp/pdfc/ $ cd pdfc $ vim __init__.py
What we have done here is created a local package pdfc
containing a module pdf_compressor.py
.
π‘ Note: The presence of file __init__.py
indicates to Python that that directory is part of a package, and to look there for modules.
Now we are ready to write our script.
The PDF Compression Python Script
Here is our script:
from pdfc.pdf_compressor import compress compress('Finxter_WorldsMostDensePythonCheatSheet.pdf', 'Finxter_WorldsMostDensePythonCheatSheet_compr.pdf', power=4)
As you can see itβs a very short script.
First we import the βcompress
β function from βpdf_compressor
β module.
Then we call the βcompress
β function. The function takes as arguments: the input file path, the output file path, and a βpower
β argument that sets compression as follows, from least compression to most (according to the documentation in the script):
Compression levels:
0: default
1: prepress
2: printer
3: ebook
4: screen
Running the Script
Now we can run our script:
$ python bpdfc.py Compress PDF... Compression by 51%. Final file size is 0.2MB Done. $
We have only compressed one PDF document in this example, but by modifying the script to loop through multiple PDF documents one can compress multiple files at once.
However, we leave that as an exercise for the reader!
We hope you have found this article useful. Thank you for reading, and we wish you happy coding!
π Recommended Tutorial: How to Compress Images in Python