In this article, we will cover some of the ways to clean up your data and make it easier to read and access using the built-in Python pprint
module.
Remember, your machine is very efficient with your data, but it doesn’t care how that data appears to the human eye.
So let’s learn how to manipulate our data to simplify the analysis.
Python pprint List
To pretty print a list, use the pprint()
function from the pprint
module that comes with the Python standard library and doesn’t need to be installed manually. For flat lists, the module will print one list element per line if it doesn’t fit into one printed output line.
Say, you have the following list of 30 values:
# Not Pretty Printed lst = ['Alice', 'Bob', 'Carl'] * 10 print(lst)
Output:
['Alice', 'Bob', 'Carl', 'Alice', 'Bob', 'Carl', 'Alice', 'Bob', 'Carl', 'Alice', 'Bob', 'Carl', 'Alice', 'Bob', 'Carl', 'Alice', 'Bob', 'Carl', 'Alice', 'Bob', 'Carl', 'Alice', 'Bob', 'Carl', 'Alice', 'Bob', 'Carl', 'Alice', 'Bob', 'Carl']
The forced line break is not very pretty. The most simple way to “prettify” the output is to use the pprint.pprint() function like so:
# Pretty Printed import pprint pprint.pprint(lst)
Output:
['Alice', 'Bob', 'Carl', 'Alice', 'Bob', 'Carl', 'Alice', 'Bob', 'Carl', 'Alice', 'Bob', 'Carl', 'Alice', 'Bob', 'Carl', 'Alice', 'Bob', 'Carl', 'Alice', 'Bob', 'Carl', 'Alice', 'Bob', 'Carl', 'Alice', 'Bob', 'Carl', 'Alice', 'Bob', 'Carl']
By default, the module will print one element per line if the whole output of the list doesn’t fit into one printed output line.
“Pretty Print” a JSON File with Python
💡 JSON, or “Javascript Object Notation” is a format used for data presentation and is a lightweight, human, and machine-friendly data interchange format that is also very Python friendly.
The great news is, that if you already know some Python and Javascript, this will be easy for you.
Think of dictionaries in Python to get an idea of the format for JSON – “key”/”value” pairs. The “keys” always being strings, the “values” include ints, Bools, arrays, none, and other objects.
There are multiple ways to get data for this type of project – I found a JSON file on employees for a dummy company that I will use for this lesson. You can also use request
from the urllib
module to get mock data.
Let’s get into some code and see pprint
in action.
First, we need to import our tools:
import json from pprint import pp # 💡 Info: pp is an alias for pprint to save typing.
Next, let’s write some code to grab our data and assign it to a variable:
with open('EmployeeData.json') as json_file: data = json.load(json_file)
Our ‘EmployeeData.json’
file has been loaded and assigned to the variable data
.
Now, we can print this and see the output using the standard print()
function:
print(data)
Output:
[{'id': 4051, 'name': 'manoj', 'email': 'manoj@gmail.com', 'password': 'Test@123', 'about': None, 'token': '7f471974-ae46-4ac0-a882-1980c300c4d6', 'country': None, 'location': None, 'lng': 0, 'lat': 0, 'dob': None, 'gender': 0, 'userType': 1, 'userStatus': 1, 'profilePicture': 'Images/9b291404-bc2e-4806-88c5-08d29e65a5ad.png', 'coverPicture': 'Images/44af97d9-b8c9-4ec1-a099-010671db25b7.png', 'enablefollowme': False, 'sendmenotifications': False, 'sendTextmessages': False, 'enabletagging': False, 'createdAt': '2020-01-01T11:13:27.1107739', 'updatedAt': '2020-01-02T09:16:49.284864', 'livelng': 77.389849, 'livelat': 2
As we can see, this very small portion of the total output is just one big block of information resembling long lines of a Python dictionary – not very “pretty” or readable for humans.
Let’s take the first step in seeing pprint
in action.
pp(data)
Output:
[{'id': 4051, 'name': 'manoj', 'email': 'manoj@gmail.com', 'password': 'Test@123', 'about': None, 'token': '7f471974-ae46-4ac0-a882-1980c300c4d6', 'country': None, 'location': None, 'lng': 0, 'lat': 0, 'dob': None, 'gender': 0, 'userType': 1, 'userStatus': 1, 'profilePicture': 'Images/9b291404-bc2e-4806-88c5-08d29e65a5ad.png', 'coverPicture': 'Images/44af97d9-b8c9-4ec1-a099-010671db25b7.png', 'enablefollowme': False, 'sendmenotifications': False, 'sendTextmessages': False, 'enabletagging': False, 'createdAt': '2020-01-01T11:13:27.1107739', 'updatedAt': '2020-01-02T09:16:49.284864', 'livelng': 77.389849, 'livelat': 28.6282231, 'liveLocation': 'Unnamed Road, Chhijarsi, Sector 63, Noida, Uttar Pradesh ' '201307, India', 'creditBalance': 127, 'myCash': 0}, {'id': 4050, 'name': 'pankaj', 'email': 'p1@gmail.com', 'password': 'Test@123', 'about': None, 'token': 'e269eeef-1de1-4438-885a-e30a9ad26106', 'country': None, 'location': None, 'lng': 0, 'lat': 0, 'dob': None, 'gender': 0, 'userType': 1, 'userStatus': 1, 'profilePicture': None,
This is another short sample of the output, but we can see how our data is much more readable – organized in key-value pairs, each on a new line.
You can vertically scan down the data, picking keys that you are interested in. It doesn’t get much simpler than that, and in the video provided in the beginning, I’ve experimented with taking this a bit further.
One other thing to notice as you analyze this output is that the structure of the JSON data is just like my list of dictionaries in the next section.
Using Python pprint to Pretty Up a Dictionary
We’ve seen how we can make a JSON file prettier with pprint
, now let’s create a dictionary and see how we can manipulate its visual appeal and readability.
A quick note before we get into the code here: There are several ways to get the results we want for the data, so if you know a different way to use pprint
, that’s great – we are not covering all its capabilities in this one article – only giving an introduction.
I’ve created a list of dictionaries to use as mock data for this tutorial section.
import pprint employees = [{"Name":"Jones, Alice", "Age": 23, "email":"alice@gmail.com"}, {"Name":"Smith, Adam", "Age": 31, "email": "adam@gmail.com"}, {"Name":"Timms, Carl", "Age": 29, "email": "carl@gmail.com"} ] print(employees)
Output:
[{'Name': 'Jones, Alice', 'Age': 23, 'email': 'alice@gmail.com'}, {'Name': 'Smith, Adam', 'Age': 31, 'email': 'adam@gmail.com'}, {'Name': 'Timms, Carl', 'Age': 29, 'email': 'carl@gmail.com'}]
We can see that the normal Python print
function gives us a continuous line of code and breaks up our original organized and neat list.
Now let’s see what we can do with pprint
.
pprint.pprint(employees, sort_dicts=False)
I’ve added the sort_dicts
parameter to maintain the order of my data. More on this in a minute.
Output:
[{'Name': 'Jones, Alice', 'Age': 23, 'email': 'alice@gmail.com'}, {'Name': 'Smith, Adam', 'Age': 31, 'email': 'adam@gmail.com'}, {'Name': 'Timms, Carl', 'Age': 29, 'email': 'carl@gmail.com'}]
That gets us back to our clean, readable output. Now let’s see what kind of a look adding some of pprint
‘s other parameters gives us.
pprint.pprint(employees, width=2, sort_dicts=False)
I’ve added the “width
” parameter at 2 to change the structure of the data, and the “sort_dicts
” to False
(it is True
by default) so pprint
won’t change the order of my entries.
Output:
[{'Name': 'Jones, ' 'Alice', 'Age': 23, 'email': 'alice@gmail.com'}, {'Name': 'Smith, ' 'Adam', 'Age': 31, 'email': 'adam@gmail.com'}, {'Name': 'Timms, ' 'Carl', 'Age': 29, 'email': 'carl@gmail.com'}]
This gives us a vertical representation of the data – a very clean and interesting view of the same information.
We can also add an “indent
” parameter set to 2 to give some space on the left of the data – notice the white space after the curly brackets making the data a bit more readable.
pprint.pprint(employees, width=3, indent=2, sort_dicts=False)
Output:
[ { 'Name': 'Jones, ' 'Alice', 'Age': 23, 'email': 'alice@gmail.com'}, { 'Name': 'Smith, ' 'Adam', 'Age': 31, 'email': 'adam@gmail.com'}, { 'Name': 'Timms, ' 'Carl', 'Age': 29, 'email': 'carl@gmail.com'}]
We can also use the assigned indexing to search our file for a specific employee number, the same as we do with any Python list.
pprint.pprint(employees[1], sort_dicts=False)
Output:
{'Name': 'Smith, Adam', 'Age': 31, 'email': 'adam@gmail.com'}
And we get the second employee Adam’s information. Other list operations can also be done, such as slicing and appending if we have a new entry to our data.
There are many other ways that you can experiment with pprint
and its parameters.
Python pprint to String
💬 Question: How to get Python’s pprint()
method to return a string instead of printing it to the standard output?
To pretty-print to a string instead of the standard output, you can use the pprint.pformat()
function instead of pprint.pprint()
. The return value of the pformat()
function is a string that can be saved to a variable or processed further.
Here’s a minimal example:
import pprint data = [{'alice': 24, 'bob': 32, 'carl': 45}, {1:2, 3:4, 5:6}, {x:y for x, y in zip(range(10), range(10,20))}] s = pprint.pformat(data) # data is now prettily formatted print(s)
🌍 Learn More: In case you need some information on our approach of creating the third dictionary in the list of dictionaries, check out our article on dictionary comprehension.
Output:
[{'alice': 24, 'bob': 32, 'carl': 45}, {1: 2, 3: 4, 5: 6}, {0: 10, 1: 11, 2: 12, 3: 13, 4: 14, 5: 15, 6: 16, 7: 17, 8: 18, 9: 19}]
👉 Further Reading: Python – How to pprint() to a String Not Printing to Shell?
Python pprint to File
You can pretty print to file by using the stream
argument of the pprint.pprint()
function and pass a file-like object into it as obtained via the built-in Python open()
function. The pprint
module will then automatically call the write()
function on the file object.
For example, pprint.pprint(my_list, open('my_file.txt', 'w'))
will pretty print the contents of my_list
into the file with name 'my_file.txt'
.
The following shows a minimal example where we pass the stream
argument as a second positional argument into the pprint()
function:
import pprint lst = ['Alice', 'Bob', 'Carl'] * 10 with open('pretty_list.txt', 'w') as outfile: pprint.pprint(lst, outfile)
The “output” is a new file 'pretty_list.txt'
with the following contents:
Conclusion
Take this introductory information and run with it, and you’ll find that this is a powerful module that you will use over and over in your Python data projects, especially when exchanging data from servers to web applications.