Reading and Writing JSON with Pandas

Rate this post

In this tutorial, we will learn how to transform a JSON object into a Pandas data frame and the other way around. For these approaches, we will have a look at the Pandas functions read_json() and to_json(). Working with JSON objects in Pandas is a very essential skill because we often find data stored in the JSON format.

As you read through the article, feel free to watch the associated article video tutorial:

What is JSON?

Before we get started with the actual Pandas functions, let’s have a look at what JSON stands for. The term “JSON” is a short word for “JavaScript Object Notation”. It is a way to store and transport all sorts of information in an organized way. JSON makes data very easy to read and access. It is often used for sending information from a server to a website.

To deepen our understanding of JSON, let’s have a look at an example:

{
    "name": "Luke",
    "age": 31,
    "married": true,
    "children": [
        {
            "name": "Ann",
            "age": 12
        },
        {
            "name": "Tom",
            "age": 8
        }
    ]
}

This is a typical JSON object. A new object starts with a curly bracket. The data is displayed as pairs of attributes and values. “name”, for example, is an attribute and “Luke” is the value for that attribute. The values can be of any data type: a string, a boolean value, an integer or float value, etc. It can also be an array with more JSON objects. In the example, the “children” attribute’s value is an array with two more JSON objects.

Now that we know, what a JSON object looks like, it’s time to have a look at the Pandas functions to work with these objects.

Converting a JSON Object into a Pandas Data Frame

In this part, we will see how we can transform a JSON object into a Pandas data frame. We will use some example JSON files with different formats to see how we can apply the read_json() function in several different ways. To get a detailed overview of all the possibilities with the read_json() function, check out the official documentation.

The first JSON object we are using looks like this:

{
    "columns": [
        "length",
        "height"
    ],
    "index": [
        "fish",
        "dog"
    ],
    "data": [
        [
            3,
            1
        ],
        [
            5,
            3
        ]
    ]
}

This JSON object has three attributes: “columns”, “index”, and “data”. So, as the names suggest, the “columns” attribute contains the column labels for the data frame we create, the “index” attribute contains the respective index labels and the “data” attribute contains the data frame’s data.

Now, we head over to the coding section and start working with Pandas:

import pandas as pd

df1 = pd.read_json('C:\Projects\Finxter articles example code\data_split.json', orient="split")

The newly-created DataFrame looks like so:

lengthheight
fish31
dog53

First, we import Pandas. Then, we create a new data frame using the read_json() function. Inside this function, we pass in the path of where the JSON file is stored. The critical part here is the “orient” parameter that follows next. This parameter expects a string and is an indication of the expected JSON string format. We assign it the value “split” which means that the JSON format is dictionary-like and has this format:

{index -> [index], columns -> [columns], data -> [values]}

Of course, the JSON file can have other formats as well. In the next step, we will have a look at this JSON file:

[
    {
        "length":3,
        "height":1
    },
    {
        "length":5,
        "height":3
    }
]

In Pandas, we create another data frame:

 
df2 = pd.read_json("C:\Projects\Finxter articles example code\data_records.json", orient="records")
lengthheight
031
153

Again, we create the data frame by applying the read_json() function and putting in it the path of the JSON file. This time, we assign the value “records” to the “orient” parameter. This JSON file is structured like a list (because it is wrapped in square brackets) and it has this structure:

[{column -> value}, ..., {column -> value}]

As we can see, the data frame does not have the index labels that we had before since index labels are not covered here. Luckily, Pandas provides us with a function to rename index labels:

df2 = df2.rename(index={0: "fish", 1: "dog"})
lengthheight
fish31
dog53

We use the “rename” function and assign a dictionary to the “index” parameter to state which indexes we want to rename. The output shows a data frame with the initial index labels.

Let’s have a look at one more example:

{
    "fish": {
        "length": 3,
        "height": 1
    },
    "dog": {
        "length": 5,
        "height": 3
    }
}

In Pandas, we create the new data frame like this:

 
df3 = pd.read_json("C:\Projects\Finxter articles example code\data_index.json", orient="index")
lengthheight
fish31
dog53

Here, the “orient” parameter gets assigned the value “index” because the JSON file is formatted by the index. The JSON file is structured like this:

{index -> {column -> value}}

Converting a Data Frame into a JSON Object

We have seen how to transform a JSON object into a Pandas data frame. Now we will have a look at the other way around: Converting a data frame into a JSON object. To get a detailed overview of all the possibilities with the to_json() function, check out the official documentation.

For this purpose, we will use the data frame we have created before:

>>> df3
lengthheight
fish31
dog53

To convert this data frame into a JSON object, we do the following:

import json

json_string = df3.to_json(orient="split")
parsed = json.loads(json_string)

print(json.dumps(parsed, indent=4))

The output is:

{
    	"columns": [
        		"length",
        		"height"
    	],
   	 "index": [
        		"fish",
        		"dog"
    	],
    	"data": [
        		[
            			3,
            			1
        		],
        		[
           			 5,
            			 3
        		]
    	]
}

First, we import the JSON library which is used for decoding and encoding JSON objects. Then, we use the Pandas function to_json() to turn the data frame “df3” into a JSON string. Therefore, we use the “orient” parameter again, just as we did before in the read_json() function. We assign the “orient” parameter the string “split” because we want the JSON object to have this structure:

{columns -> [columns], index -> [index], data -> [values]}

After we have successfully created the JSON string, we use the loads() function from the JSON library which parses the JSON string and converts it into a Python dictionary. Finally, we apply the JSON library’s function dumps() which converts the dictionary into a JSON object. Here, we pass in the “indent” parameter which defines the indentation of the JSON object. We print the output and get a JSON object with which we are already familiar.

As with reading JSON objects, we can also choose other JSON string formats apart from “split”:

json_string = df3.to_json(orient="values")
parsed = json.loads(json_string)

print(json.dumps(parsed, indent=4))

The output is:

[
    	[
        		3,
        		1
    	],
    	[
        		5,
        		3
    	]
]

Here, we do the same as before, but we change the “orient” parameter to “values” and the output JSON object shows an array containing more arrays for each row respectively. These nested arrays contain the values.

Let’s have a look at another variation:

json_string = df3.to_json(orient="columns")
parsed = json.loads(json_string)

print(json.dumps(parsed, indent=4))

The output is:

{
    	"length": {
        		"fish": 3,
        		"dog": 5
   	 },
    	"height": {
       		 "fish": 1,
        		"dog": 3
    	}
}

In this example, we change the value for the “orient” parameter to “columns”. This way, we get to have a JSON format like this:

{column -> {index -> value}}

Creating a Table Schema

When creating JSON objects from data frames, it is also possible to build a table schema. To achieve that, we do essentially the same as before when we created JSON objects. This time, we assign the “orient” parameter the value “table”:

json_string = df3.to_json(orient="table")
parsed = json.loads(json_string)

print(json.dumps(parsed, indent=4))

The output is:

{
    	"schema": {
        		"fields": [
            			{
                			"name": "index",
                			"type": "string"
            			},
            			{
               			"name": "length",
                			"type": "integer"
            			},
            			{
                			"name": "height",
                			"type": "integer"
            			}
        		],
        		"primaryKey": [
            			"index"
        		],
        		"pandas_version": "0.20.0"
    	},
    	"data": [
        		{
            			"index": "fish",
            			"length": 3,
            			"height": 1
        		},
        		{
            			"index": "dog",
            			"length": 5,
            			"height": 3
        		}
    	]
}

This approach provides us with a lot more information because, as we can see, this JSON object is bigger than the other ones we created. It is divided into two big parts: the “schema” and the “data” section.

  • The “data” section contains the same data as before since we used the same initial data frame.
  • The “schema” section allocates the data types of the index and the columns. Furthermore, this section shows the primary key which is by default set to index level. And it shows the Pandas version which is hardcoded to “0.20.0”.

To get our actual Pandas version, we do this:

print(pd.__version__)
# 1.3.4

Writing a JSON Object to a JSON File

In the last sections, we have seen how to transform data frames into JSON objects. Now we will learn how to write these JSON objects that we just created to separate JSON files. The approach is very straightforward and looks like this:

json_string = df3.to_json(orient="columns")
parsed = json.loads(json_string)
data = json.dumps(parsed, indent=4)

with open("new_data.json", "w") as file:
    file.write(data)

We transform a data frame to a JSON object, just like we did before. But this time, we do not output the JSON object, but assign it to a variable called “data”.

Then, we use the “with” statement that Python provides us with. The “with” statement makes working with files very easy. We create a new file called “new_data.json”. The file extension “.json” is very important here to tell the program that we want to create a JSON file. We write the JSON object into this file using the “data” variable that contains the aforementioned object. Once executed, a new file “new_data.json” was created in the working directory. When we open it, we get to see the JSON object in a separate file:

Summary

To put it in a nutshell, working with JSON objects in Pandas is very useful since data is often stored in the JSON format. We learned how to transform JSON objects into Pandas data frames and how to turn data frames into JSON objects. Furthermore, we have seen how we can write newly created JSON objects to JSON files.

For more tutorials about Pandas, Python libraries, Python in general, or other computer science-related topics, check out the Finxter blog page.

Happy Coding!