Python Input/Output – JSON

Over your career as a Data Scientist, there may be instances where you will work with data to/from a DataFrame to JSON format.  This article shows you how to manipulate this data using the above functions.

This article covers the commonly used parameters for each function listed above. For a complete list of all parameters and their use, click here.

Install Required Library

Before any data manipulation can occur, a new library will require installation. The pandas library enables access to/from a DataFrame.

To install this library, navigate to an IDE terminal. At the command prompt ($), execute the code below. For the terminal used in this example, the command prompt is a dollar sign ($). Your terminal prompt may be different.

$ pip install pandas

Hit the <Enter> key on the keyboard to start the installation process.

If the installation was successful, a message displays in the terminal indicating the same.

Read JSON File

Function Outline:

pandas.io.json.read_json(path_or_buf=None, orient=None, typ='frame', 
                         dtype=None, convert_axes=None, convert_dates=True, 
                         keep_default_dates=True, numpy=False, precise_float=False, 
                         date_unit=None, encoding=None, encoding_errors='strict', 
                         lines=False, chunksize=None, compression='infer', 
                         nrows=None, storage_options=None)

This function returns a converted JSON string to a DataFrame.

If working with large data sets, save the data in JSON format. JSON stands for JavaScript Object Notation.  JSON is a string.

A few need-to-know things about JSON are:

  • The JSON string saves to a flat file (text file).
  • The MIME type is application/json.
  • The file extension is json. For example, myfile.json.
  • The format transmits data between computers.
  • Many coding languages can read and generate JSON, such as pandas!

💡 Note: Converting a string to an object is referred to as de-serialization. Converting an object to a string data type is referred to as serialization.

Example:

For this example, three new people joined the Finxter Academy one month ago. The Academy wants to watch their puzzle-solving ability progress to test their theory.

To do this, perform the following steps:

  • Highlight the text below. Press CTL+C to copy the contents to the system Clipboard.
  • Open a text editor (Notepad). Paste the contents (CTRL+V) of the system Clipboard to the file.
  • Save the file finxters.json to the current working directory.

Copy to Clipboard:

[
	{
		"user":  1042,
		"score": 1710,
		"level": "Expert"
	},
	{
		"user":  1043,
		"score": 1960,
		"level": "Authority"
	},
	{
		"user":  1044,
		"score": 1350,
		"level": "Learner"
	}
]

With the finxters.json file saved to the current working directory, run the code below.

import pandas as pd
df = pd.read_json('finxters.json')
print(df)
  • Line [2] reads in the newly created finxters.json file and assigns the contents to a DataFrame (df).
  • Line [3] outputs the contents to the terminal.

Output:

 userscorelevel
010421710Expert
110431960Authority
210441350Learner

Send DataFrame to JSON

Function Outline:

pandas.io.json.to_json(path_or_buf, obj, orient=None, date_format='epoch', 
                       double_precision=10, force_ascii=True, 
                       date_unit='ms', default_handler=None, 
                       lines=False, compression='infer', 
                       index=True, indent=0, storage_options=None)

This function sends a DataFrame to JSON.

In Section 2 above, we created a JSON file and read this JSON file into a DataFrame. This example sends the output from the above to a JSON file.

import pandas as pd
df = pd.read_json('finxters.json')
df.to_json('newbies.json')
df = pd.read_json('newbies.json')
print(df)
  • Line [2] reads in the existing finxters.json file and assigns the contents to a DataFrame (df).
  • Line [3] sends the DataFrame (df) to a new JSON file, newbies.json.
  • Line [4] reads in the newly created newbies.json file and assigns the contents to a DataFrame (df).
  • Line [5] outputs the contents to the terminal.

The output is the same as above.

Create Table from Schema

Function Outline:

pandas.io.json.build_table_schema(data, index=True, primary_key=None, version=True)

This function creates a Table Schema from the data below.

import pandas as pd
from pandas.io.json import build_table_schema

df = pd.DataFrame(
    {'fid':     [1042, 1043, 1044],
     'level':   ['Expert', 'Authority', 'Learner'],
     'months':  [1, 1, 1],
    }, index = pd.Index(range(3), name='idx'))

build_table_schema(df)    

{'fields': [{'name': 'idx',    'type': 'integer'}, 
            {'name': 'fid',    'type': 'integer'}, 
            {'name': 'level',  'type': 'integer'},
            {'name': 'months', 'type': 'integer'},
            ], 'primaryKey':  ['idx'], 'pandas_version': '0.20.0'}

print(df)
  • Line [2] imports the build_table_schema from the pandas.io.json library.
  • Line [3] creates a DataFrame with field names and accompanying data.
  • Line [4] builds the table scheme.
  • Line [5] sets up the field structures, assigns the field name, field type, primary key, and the pandas version. All of this information is required.
  • Line [6] outputs the contents to the terminal.

Output:

 userlevelmonths
idx
01042Expert1
11043Authority1
21044Learner1