Over your career as a Data Scientist, there may be instances where you will work with data to/from a DataFrame to JSON format. This article shows you how to manipulate this data using the above functions.
This article covers the commonly used parameters for each function listed above. For a complete list of all parameters and their use, click here.
Preparation
Before any data manipulation can occur, one (1) new library will require installation.
- The Pandas library enables access to/from a DataFrame.
To install this library, navigate to an IDE terminal. At the command prompt ($
), execute the code below. For the terminal used in this example, the command prompt is a dollar sign ($
). Your terminal prompt may be different.
$ pip install pandas
Hit the <Enter>
key on the keyboard to start the installation process.
If the installation was successful, a message displays in the terminal indicating the same.
Feel free to view the PyCharm installation guide for the required library.
Add the following code to the top of each code snippet. This snippet will allow the code in this article to run error-free.
import pandas as pd from pandas.io.json import build_table_schema
Read JSON File
Function Outline
pandas.io.json.read_json(path_or_buf=None, orient=None, typ='frame', dtype=None, convert_axes=None, convert_dates=True, keep_default_dates=True, numpy=False, precise_float=False, date_unit=None, encoding=None, encoding_errors='strict', lines=False, chunksize=None, compression='infer', nrows=None, storage_options=None)
This function returns a converted JSON string to a DataFrame.
If working with large data sets, save the data in JSON format. JSON stands for JavaScript Object Notation. JSON is a string.
A few need-to-know things about JSON are:
- The JSON string saves to a flat file (text file).
- The MIME type is
application/json
. - The file extension is json. For example,
myfile.json
. - The format transmits data between computers.
- Many coding languages can read and generate JSON, such as pandas!
π‘ Note: Converting a string to an object is called de-serialization. Converting an object to a string data type is referred to as serialization.
Let’s say three new people joined the Finxter Academy one month ago. Naturally, the Academy wants to watch their puzzle-solving ability progress to test their theory.
To do this, perform the following steps:
- Highlight the text below. Press
CTL+C
to copy the contents to the system Clipboard. - Open a text editor (Notepad). Paste the contents (
CTRL+V
) of the system Clipboard to the file. - Save the file
finxters.json
to the current working directory.
[ { "user": 1042, "score": 1710, "level": "Expert" }, { "user": 1043, "score": 1960, "level": "Authority" }, { "user": 1044, "score": 1350, "level": "Learner" } ]
With the finxters.json
file saved to the current working directory, run the code below.
df = pd.read_json('finxters.json') print(df)
- Line [1] reads in the newly created
finxters.json
file and assigns the contents to a DataFrame (df
). - Line [2] outputs the contents to the terminal.
Output
user | score | level | |
0 | 1042 | 1710 | Expert |
1 | 1043 | 1960 | Authority |
2 | 1044 | 1350 | Learner |
Send DataFrame to JSON
Function Outline
pandas.io.json.to_json(path_or_buf, obj, orient=None, date_format='epoch', double_precision=10, force_ascii=True, date_unit='ms', default_handler=None, lines=False, compression='infer', index=True, indent=0, storage_options=None)
This function sends a DataFrame to JSON.
In Section 2 above, we created a JSON file and read this JSON file into a DataFrame. This example sends the output from the above to a JSON file.
df = pd.read_json('finxters.json') df.to_json('newbies.json') df = pd.read_json('newbies.json') print(df)
- Line [1] reads in the existing
finxters.json
file and assigns the contents to a DataFrame (df
). - Line [2] sends the DataFrame (
df
) to a new JSON file,newbies.json
. - Line [3] reads in the newly created
newbies.json
file and assigns the contents to a DataFrame (df
). - Line [4] outputs the contents to the terminal.
The output is the same as above.
Create Table from Schema
Function Outline
pandas.io.json.build_table_schema(data, index=True, primary_key=None, version=True)
This function creates a Table Schema from the data below.
df = pd.DataFrame( {'fid': [1042, 1043, 1044], 'level': ['Expert', 'Authority', 'Learner'], 'months': [1, 1, 1], }, index = pd.Index(range(3), name='idx')) build_table_schema(df) {'fields': [{'name': 'idx', 'type': 'integer'}, {'name': 'fid', 'type': 'integer'}, {'name': 'level', 'type': 'integer'}, {'name': 'months', 'type': 'integer'}, ], 'primaryKey': ['idx'], 'pandas_version': '0.20.0'} print(df)
- Line [1] creates a DataFrame with field names and accompanying data.
- Line [2] builds the table scheme.
- Line [3] sets up the field structures and assigns the field name, field type, primary key, and the Pandas version. All of this information is required.
- Line [4] outputs the contents to the terminal.
Output
user | level | months | |
idx | |||
0 | 1042 | Expert | 1 |
1 | 1043 | Authority | 1 |
2 | 1044 | Learner | 1 |