Python Input/Output – JSON

Over your career as a Data Scientist, there may be instances where you will work with data to/from a DataFrame to JSON format. This article shows you how to manipulate this data using the above functions.

This article covers the commonly used parameters for each function listed above. For a complete list of all parameters and their use, click here.


Preparation

Before any data manipulation can occur, one (1) new library will require installation.

  • The Pandas library enables access to/from a DataFrame.

To install this library, navigate to an IDE terminal. At the command prompt ($), execute the code below. For the terminal used in this example, the command prompt is a dollar sign ($). Your terminal prompt may be different.

$ pip install pandas

Hit the <Enter> key on the keyboard to start the installation process.

If the installation was successful, a message displays in the terminal indicating the same.


Feel free to view the PyCharm installation guide for the required library.


Add the following code to the top of each code snippet. This snippet will allow the code in this article to run error-free.

import pandas as pd 
from pandas.io.json import build_table_schema

Read JSON File

Function Outline

pandas.io.json.read_json(path_or_buf=None, orient=None, typ='frame', 
                         dtype=None, convert_axes=None, convert_dates=True, 
                         keep_default_dates=True, numpy=False, precise_float=False, 
                         date_unit=None, encoding=None, encoding_errors='strict', 
                         lines=False, chunksize=None, compression='infer', 
                         nrows=None, storage_options=None)

This function returns a converted JSON string to a DataFrame.

If working with large data sets, save the data in JSON format. JSON stands for JavaScript Object Notation. JSON is a string.

A few need-to-know things about JSON are:

  • The JSON string saves to a flat file (text file).
  • The MIME type is application/json.
  • The file extension is json. For example, myfile.json.
  • The format transmits data between computers.
  • Many coding languages can read and generate JSON, such as pandas!

πŸ’‘ Note: Converting a string to an object is called de-serialization. Converting an object to a string data type is referred to as serialization.

Let’s say three new people joined the Finxter Academy one month ago. Naturally, the Academy wants to watch their puzzle-solving ability progress to test their theory.

To do this, perform the following steps:

  • Highlight the text below. Press CTL+C to copy the contents to the system Clipboard.
  • Open a text editor (Notepad). Paste the contents (CTRL+V) of the system Clipboard to the file.
  • Save the file finxters.json to the current working directory.
[
	{
		"user":  1042,
		"score": 1710,
		"level": "Expert"
	},
	{
		"user":  1043,
		"score": 1960,
		"level": "Authority"
	},
	{
		"user":  1044,
		"score": 1350,
		"level": "Learner"
	}
]

With the finxters.json file saved to the current working directory, run the code below.

df = pd.read_json('finxters.json')
print(df)
  • Line [1] reads in the newly created finxters.json file and assigns the contents to a DataFrame (df).
  • Line [2] outputs the contents to the terminal.

Output

 userscorelevel
010421710Expert
110431960Authority
210441350Learner

Send DataFrame to JSON

Function Outline

pandas.io.json.to_json(path_or_buf, obj, orient=None, date_format='epoch', 
                       double_precision=10, force_ascii=True, 
                       date_unit='ms', default_handler=None, 
                       lines=False, compression='infer', 
                       index=True, indent=0, storage_options=None)

This function sends a DataFrame to JSON.

In Section 2 above, we created a JSON file and read this JSON file into a DataFrame. This example sends the output from the above to a JSON file.

df = pd.read_json('finxters.json')
df.to_json('newbies.json')
df = pd.read_json('newbies.json')
print(df)
  • Line [1] reads in the existing finxters.json file and assigns the contents to a DataFrame (df).
  • Line [2] sends the DataFrame (df) to a new JSON file, newbies.json.
  • Line [3] reads in the newly created newbies.json file and assigns the contents to a DataFrame (df).
  • Line [4] outputs the contents to the terminal.

The output is the same as above.


Create Table from Schema

Function Outline

pandas.io.json.build_table_schema(data, index=True, primary_key=None, version=True)

This function creates a Table Schema from the data below.

df = pd.DataFrame(
    {'fid':     [1042, 1043, 1044],
     'level':   ['Expert', 'Authority', 'Learner'],
     'months':  [1, 1, 1],
    }, index = pd.Index(range(3), name='idx'))

build_table_schema(df)    

{'fields': [{'name': 'idx',    'type': 'integer'}, 
            {'name': 'fid',    'type': 'integer'}, 
            {'name': 'level',  'type': 'integer'},
            {'name': 'months', 'type': 'integer'},
            ], 'primaryKey':  ['idx'], 'pandas_version': '0.20.0'}

print(df)
  • Line [1] creates a DataFrame with field names and accompanying data.
  • Line [2] builds the table scheme.
  • Line [3] sets up the field structures and assigns the field name, field type, primary key, and the Pandas version. All of this information is required.
  • Line [4] outputs the contents to the terminal.

Output

 userlevelmonths
idx
01042Expert1
11043Authority1
21044Learner1