In this tutorial, we will learn how to transform a JSON object into a Pandas data frame and the other way around. For these approaches, we will have a look at the Pandas functions read_json()
and to_json()
. Working with JSON objects in Pandas is a very essential skill because we often find data stored in the JSON format.
As you read through the article, feel free to watch the associated article video tutorial:
What is JSON?
Before we get started with the actual Pandas functions, letβs have a look at what JSON stands for. The term βJSONβ is a short word for βJavaScript Object Notationβ. It is a way to store and transport all sorts of information in an organized way. JSON makes data very easy to read and access. It is often used for sending information from a server to a website.
To deepen our understanding of JSON, letβs have a look at an example:
{ "name": "Luke", "age": 31, "married": true, "children": [ { "name": "Ann", "age": 12 }, { "name": "Tom", "age": 8 } ] }
This is a typical JSON object. A new object starts with a curly bracket. The data is displayed as pairs of attributes and values. βname
β, for example, is an attribute and “Luke
” is the value for that attribute. The values can be of any data type: a string, a boolean value, an integer or float value, etc. It can also be an array with more JSON objects. In the example, the βchildrenβ attributeβs value is an array with two more JSON objects.
Now that we know, what a JSON object looks like, itβs time to have a look at the Pandas functions to work with these objects.
Converting a JSON Object into a Pandas Data Frame
In this part, we will see how we can transform a JSON object into a Pandas data frame. We will use some example JSON files with different formats to see how we can apply the read_json()
function in several different ways. To get a detailed overview of all the possibilities with the read_json()
function, check out the official documentation.
The first JSON object we are using looks like this:
{ "columns": [ "length", "height" ], "index": [ "fish", "dog" ], "data": [ [ 3, 1 ], [ 5, 3 ] ] }
This JSON object has three attributes: βcolumns
β, βindex
β, and βdata
β. So, as the names suggest, the βcolumns
β attribute contains the column labels for the data frame we create, the βindex
β attribute contains the respective index labels and the βdata
β attribute contains the data frameβs data.
Now, we head over to the coding section and start working with Pandas:
import pandas as pd df1 = pd.read_json('C:\Projects\Finxter articles example code\data_split.json', orient="split")
The newly-created DataFrame looks like so:
length | height | |
fish | 3 | 1 |
dog | 5 | 3 |
First, we import Pandas. Then, we create a new data frame using the read_json()
function. Inside this function, we pass in the path of where the JSON file is stored. The critical part here is the βorient
β parameter that follows next. This parameter expects a string and is an indication of the expected JSON string format. We assign it the value βsplit
β which means that the JSON format is dictionary-like and has this format:
{index -> [index], columns -> [columns], data -> [values]}
Of course, the JSON file can have other formats as well. In the next step, we will have a look at this JSON file:
[ { "length":3, "height":1 }, { "length":5, "height":3 } ]
In Pandas, we create another data frame:
df2 = pd.read_json("C:\Projects\Finxter articles example code\data_records.json", orient="records")
length | height | |
0 | 3 | 1 |
1 | 5 | 3 |
Again, we create the data frame by applying the read_json()
function and putting in it the path of the JSON file. This time, we assign the value βrecords
β to the βorient
β parameter. This JSON file is structured like a list (because it is wrapped in square brackets) and it has this structure:
[{column -> value}, ..., {column -> value}]
As we can see, the data frame does not have the index labels that we had before since index labels are not covered here. Luckily, Pandas provides us with a function to rename index labels:
df2 = df2.rename(index={0: "fish", 1: "dog"})
length | height | |
fish | 3 | 1 |
dog | 5 | 3 |
We use the βrename
β function and assign a dictionary to the βindex
β parameter to state which indexes we want to rename. The output shows a data frame with the initial index labels.
Letβs have a look at one more example:
{ "fish": { "length": 3, "height": 1 }, "dog": { "length": 5, "height": 3 } }
In Pandas, we create the new data frame like this:
df3 = pd.read_json("C:\Projects\Finxter articles example code\data_index.json", orient="index")
length | height | |
fish | 3 | 1 |
dog | 5 | 3 |
Here, the βorient
β parameter gets assigned the value βindex
β because the JSON file is formatted by the index. The JSON file is structured like this:
{index -> {column -> value}}
Converting a Data Frame into a JSON Object
We have seen how to transform a JSON object into a Pandas data frame. Now we will have a look at the other way around: Converting a data frame into a JSON object. To get a detailed overview of all the possibilities with the to_json()
function, check out the official documentation.
For this purpose, we will use the data frame we have created before:
>>> df3
length | height | |
fish | 3 | 1 |
dog | 5 | 3 |
To convert this data frame into a JSON object, we do the following:
import json json_string = df3.to_json(orient="split") parsed = json.loads(json_string) print(json.dumps(parsed, indent=4))
The output is:
{ "columns": [ "length", "height" ], "index": [ "fish", "dog" ], "data": [ [ 3, 1 ], [ 5, 3 ] ] }
First, we import the JSON library which is used for decoding and encoding JSON objects. Then, we use the Pandas function to_json()
to turn the data frame βdf3
β into a JSON string. Therefore, we use the βorient
β parameter again, just as we did before in the read_json()
function. We assign the βorient
β parameter the string βsplit
β because we want the JSON object to have this structure:
{columns -> [columns], index -> [index], data -> [values]}
After we have successfully created the JSON string, we use the loads()
function from the JSON library which parses the JSON string and converts it into a Python dictionary. Finally, we apply the JSON libraryβs function dumps()
which converts the dictionary into a JSON object. Here, we pass in the βindent
β parameter which defines the indentation of the JSON object. We print the output and get a JSON object with which we are already familiar.
As with reading JSON objects, we can also choose other JSON string formats apart from βsplit
β:
json_string = df3.to_json(orient="values") parsed = json.loads(json_string) print(json.dumps(parsed, indent=4))
The output is:
[ [ 3, 1 ], [ 5, 3 ] ]
Here, we do the same as before, but we change the βorient
β parameter to βvalues
β and the output JSON object shows an array containing more arrays for each row respectively. These nested arrays contain the values.
Letβs have a look at another variation:
json_string = df3.to_json(orient="columns") parsed = json.loads(json_string) print(json.dumps(parsed, indent=4))
The output is:
{ "length": { "fish": 3, "dog": 5 }, "height": { "fish": 1, "dog": 3 } }
In this example, we change the value for the βorient
β parameter to βcolumns
β. This way, we get to have a JSON format like this:
{column -> {index -> value}}
Creating a Table Schema
When creating JSON objects from data frames, it is also possible to build a table schema. To achieve that, we do essentially the same as before when we created JSON objects. This time, we assign the βorient
β parameter the value βtable
β:
json_string = df3.to_json(orient="table") parsed = json.loads(json_string) print(json.dumps(parsed, indent=4))
The output is:
{ "schema": { "fields": [ { "name": "index", "type": "string" }, { "name": "length", "type": "integer" }, { "name": "height", "type": "integer" } ], "primaryKey": [ "index" ], "pandas_version": "0.20.0" }, "data": [ { "index": "fish", "length": 3, "height": 1 }, { "index": "dog", "length": 5, "height": 3 } ] }
This approach provides us with a lot more information because, as we can see, this JSON object is bigger than the other ones we created. It is divided into two big parts: the βschema
β and the βdata
β section.
- The β
data
β section contains the same data as before since we used the same initial data frame. - The β
schema
β section allocates the data types of the index and the columns. Furthermore, this section shows the primary key which is by default set to index level. And it shows the Pandas version which is hardcoded to β0.20.0β.
To get our actual Pandas version, we do this:
print(pd.__version__) # 1.3.4
Writing a JSON Object to a JSON File
In the last sections, we have seen how to transform data frames into JSON objects. Now we will learn how to write these JSON objects that we just created to separate JSON files. The approach is very straightforward and looks like this:
json_string = df3.to_json(orient="columns") parsed = json.loads(json_string) data = json.dumps(parsed, indent=4) with open("new_data.json", "w") as file: file.write(data)
We transform a data frame to a JSON object, just like we did before. But this time, we do not output the JSON object, but assign it to a variable called βdata
β.
Then, we use the βwith
β statement that Python provides us with. The βwith
β statement makes working with files very easy. We create a new file called βnew_data.jsonβ. The file extension β.jsonβ is very important here to tell the program that we want to create a JSON file. We write the JSON object into this file using the βdata
β variable that contains the aforementioned object. Once executed, a new file βnew_data.jsonβ was created in the working directory. When we open it, we get to see the JSON object in a separate file:
Summary
To put it in a nutshell, working with JSON objects in Pandas is very useful since data is often stored in the JSON format. We learned how to transform JSON objects into Pandas data frames and how to turn data frames into JSON objects. Furthermore, we have seen how we can write newly created JSON objects to JSON files.
For more tutorials about Pandas, Python libraries, Python in general, or other computer science-related topics, check out the Finxter blog page.
Happy Coding!