This article focuses on the serialization and conversion methods of a Python DataFrame:
from_dict()
,to_dict()
,from_records()
,to_records()
,to_json()
, andto_pickles()
.
Let’s get started!
Preparation
Before any data manipulation can occur, two (2) new libraries will require installation.
- The Pandas library enables access to/from a DataFrame.
- The NumPy library supports multi-dimensional arrays and matrices in addition to a collection of mathematical functions.
To install these libraries, navigate to an IDE terminal. At the command prompt ($
), execute the code below. For the terminal used in this example, the command prompt is a dollar sign ($
). Your terminal prompt may be different.
$ pip install pandas
Hit the <Enter> key on the keyboard to start the installation process.
$ pip install numpy
Hit the <Enter> key on the keyboard to start the installation process.
If the installations were successful, a message displays in the terminal indicating the same.
Feel free to view the PyCharm installation guide for the required libraries.
Add the following code to the top of each code snippet. This snippet will allow the code in this article to run error-free.
import pandas as pd import numpy as np
DataFrame.from_dict()
The from_dict()
classmethod converts a valid dictionary structure into a DataFrame format. Upon conversion, the keys of the original dictionary translate to DataFrame columns.
The syntax for this method is as follows:
classmethod DataFrame.from_dict(data, orient='columns', dtype=None, columns=None)
Parameter | Description |
---|---|
data | The parameter is a valid dictionary to be converted. |
orient | The available options are: – 'columns' : if keys are columns, pass this option. Selected by default.– 'index' : If keys are rows, pass this option.– 'tight' : if tight, assume a dictionary with keys. |
dtype | This parameter is the data type to force. Otherwise, it is, by default, infer . |
columns | This parameter is the column(s) to use if orient is 'index' . |
For this example, a Dictionary containing the first five (5) elements of the Periodic Table convert to a DataFrame.
elements = {'Hydrogen': [1, 1766], 'Helium': [2, 1868], 'Lithium': [3, 1817], 'Beryllium': [4, 1798], 'Boron': [5, 1808]} periodic_df = pd.DataFrame.from_dict(elements, orient='index', columns=['Atomic #', 'Discovered']) print(periodic_df)
- Line [1] creates a dictionary of lists and saves it to the variable elements.
- Line [2] does the following:
- creates a DataFrame from the elements Dictionary
- sets the orient parameter to index
- sets the column names to clearly identify the data
- saves the output to the
periodic_df
DataFrame
- Line [3] outputs the DataFrame to the terminal.
Output
Atomic # | Discovered | |
Hydrogen | 1 | 1766 |
Helium | 2 | 1868 |
Lithium | 3 | 1817 |
Beryllium | 4 | 1798 |
Boron | 5 | 1808 |
DataFrame.to_dict()
The to_dict()
method converts a valid DataFrame structure to a dictionary format.
The syntax for this method is as follows:
DataFrame.to_dict(orient='dict', into=<class 'dict'>)
Parameter | Description |
---|---|
orient | This parameter sets the values of the dictionary. The available options are: – 'dict' : dictionary: {column -> {index -> value}} – ‘list’ : dictionary: {column -> [values]} – ‘series’ : dictionary: {column -> Series(values)} – ‘split’ : dictionary: {‘index’ -> [index], ‘columns’, etc.} – ‘tight’ : dictionary: {‘index’ -> [index], etc.} – ‘records’ : list: [{column -> value}, … , {column -> value}] – ‘index’ : dictionary: {index -> {column -> value}} |
into | This parameter sets the data structure to convert the data into. The default value is a dictionary. |
This example reads in the file’s first (5) rows / three (3) columns to a DataFrame. This DataFrame then converts to a dictionary format.
Click here to save this CSV file and move it to the current working directory.
df = pd.read_csv('finxters.csv', usecols=['FID', 'First_Name', 'Last_Name']).head() print(df) result = df.to_dict() print(result)
- Line [1] reads in the first five (5) rows (head) and three (3) columns (usecols) of the
finxters.csv
file. The output saves to a DataFrame (df
). - Line [2] outputs the DataFrame to the terminal.
- Line [3] converts the DataFrame (
df
) to a dictionary. The output saves toresult
. - Line [4] outputs the result to the terminal.
Output – df
FID | First_Name | Last_Name | |
0 | 30022145 | Steve | Hamilton |
1 | 30022192 | Amy | Pullister |
2 | 30022331 | Peter | Dunn |
3 | 30022345 | Marcus | Williams |
4 | 30022359 | Alice | Miller |
Output – result
{'FID': {0: 30022145, 1: 30022192, 2: 30022331, 3: 30022345, 4: 30022359}, |
If the split
parameter was passed to the to_dict()
parameter, the output would be as follows:
df = pd.read_csv('finxters.csv', usecols=['FID', 'First_Name', 'Last_Name']).head() print(df) result = df.to_dict('split') print(result)
Output – result
{'index': [0, 1, 2, 3, 4], |
DataFrame.from_records()
The from_records()
classmethod converts a valid ndarray
, tuple, or dictionary structure into a DataFrame format.
The syntax for this method is as follows:
classmethod DataFrame.from_records(data, index=None, exclude=None, columns=None, coerce_float=False, nrows=None)
Parameter | Description |
---|---|
data | This parameter is a valid ndarray , tuple, or dictionary structure. |
index | A field of arrays for the index or a list containing a specific set. |
exclude | The columns/fields to exclude from the conversion. |
columns | The column names to use in the conversion. |
coerce_float | This parameter tries to convert decimal values to floats. |
nrows | If an iterator, the number of rows to read in. |
This example converts a list of tuples (an ndarray
) containing four (4) fictitious Finxter users to a DataFrame.
data = np.array([(30022145, 'wildone92'), (30022192, 'AmyP'), (30022331, '1998_pete'), (30022345, 'RexTex')]) users_df = pd.DataFrame.from_records(data, columns=['ID', 'Username']) print(users_df)
- Line [1] creates a list of tuples (ndarray) and saves it to the
data
variable. - Line [2] does the following:
- creates a DataFrame from the
data
variable - sets the column names to clearly identify the data
- creates a DataFrame from the
- Outputs the DataFrame to the terminal.
Output
ID | Username | |
0 | 30022145 | wildone92 |
1 | 30022192 | AmyP |
2 | 30022331 | 1998_pete |
3 | 30022345 | RexTex |
DataFrame.to_records()
The to_records()
method converts a valid DataFrame structure to a NumPy record array. The index is included as the first field if requested.
The syntax for this method is as follows:
DataFrame.to_records(index=True, column_dtypes=None, index_dtypes=None)
Parameter | Description |
---|---|
index | This parameter, if True , includes the index in the record array.This value saves to the index field or index label. |
column_dtypes | The data type to store the columns. If a dictionary, each column maps accordingly. |
index_dtypes | The data type to store index levels. If a dictionary, each index level and indices map accordingly. |
This example reads in the file’s first (5) rows / three (3) columns to a DataFrame. This DataFrame then converts to records.
Click here to save this CSV file and move it to the current working directory.
df = pd.read_csv('finxters.csv', usecols=['FID', 'First_Name', 'Last_Name']).head() print(df) result = df.to_records() print(result)
- Line [1] reads in the first five (5) rows (head) and three (3) columns (
usecols
) of thefinxters.csv
file. The output saves to a DataFrame (df
). - Line [2] outputs the DataFrame to the terminal.
- Line [3] converts the DataFrame (
df
) to records. The output saves toresult
. - Line [4] outputs the result to the terminal.
FID | First_Name | Last_Name | |
0 | 30022145 | Steve | Hamilton |
1 | 30022192 | Amy | Pullister |
2 | 30022331 | Peter | Dunn |
3 | 30022345 | Marcus | Williams |
4 | 30022359 | Alice | Miller |
Output – df
Output – result
[(0, 30022145, 'Steve', 'Hamilton') (1, 30022192, 'Amy', 'Pullister') |
DataFrame.to_json()
The to_json()
method converts a DataFrame object to a JSON string.
💡 Note: Any NaN
/None
values will convert to NULL values.
Any DateTime objects will convert to UNIX timestamps.
The syntax for this method is as follows:
DataFrame.to_json(path_or_buf=None, orient=None, date_format=None, double_precision=10, force_ascii=True, date_unit='ms', default_handler=None, lines=False, compression='infer', index=True, indent=None, storage_options=None)
Parameter | Description |
---|---|
path_or_buf | This parameter is a string, path, or file object with a write function. |
orient | This parameter is the expected JSON format. The options are a: Series: – default is 'index' – values are: ‘split’ , ‘records’ , ‘index’ , ‘table’ DataFrame: – default is 'columns' – values are: ‘split’, ‘records’, ‘index’, ‘columns’, ‘values’, ‘table’ JSON: – 'dict' : dictionary: {column -> {index -> value}} – ‘list’ : dictionary: {column -> [values]} – ‘series’ : dictionary: {column -> Series(values)} – ‘split’ : dictionary: {‘index’ -> [index], ‘columns’, etc.} – ‘tight’ : dictionary: {‘index’ -> [index], etc.} – ‘records’ : list: [{column -> value}, … , {column -> value}] – ‘index’ : dictionary: {index -> {column -> value}} |
date_format | This is the format of the date conversion. The options are:'epoch' or 'iso' . |
double_precision | The decimal places to use when encoding float values. |
force_ascii | Whether to force the encoded string to be valid ASII. |
date_unit | The unit of time for encoding. |
default_handler | The handler to call if the string can not be converted to JSON. |
lines | If orient is ‘records’ , then write a line delimited JSON string. |
compression | If 'infer'/‘path_or_buf’ , use: ‘.gz’, ‘.bz2’, ‘.zip’, ‘.xz’ , or ‘.zst’ ext. |
index | If True , this parameter includes index values in the JSON string. |
indent | This parameter determines the length of the indent for a record. |
storage_options | This parameter contains extra options (dictionary format), such as host, port, username, etc. |
This example reads in the countries.csv
file to a DataFrame. This DataFrame then converts to JSON. Click here to save this CSV file and move it to the current working directory.
df = pd.read_csv('countries.csv').head() result = df.to_json(indent=4, orient='records', lines=True) print(result)
- Line [1] reads in the first five (5) rows (head) of the
countries.csv
file. The output saves to a DataFrame (df
). - Line [2] does the following:
- converts the DataFrame to a JSON format
- formats the output by indenting each record four (4) spaces from the left
- sets the orient parameter to records and lines to
True
(see above definition). - saves the output to
result
.
- Line [3] outputs the result to the terminal.
Output – result
{ "Country":"Germany", "Capital":"Berlin", "Population":83783942, "Area":357021 } |
{ "Country":"France", "Capital":"Paris", "Population":67081000, "Area":551695 } |
{ "Country":"Spain", "Capital":"Madrid", "Population":47431256, "Area":498511 } |
{ "Country":"Italy", "Capital":"Rome", "Population":60317116, "Area":301338 } |
{ "Country":"Poland", "Capital":"Warsaw", "Population":38383000, "Area":312685 } |
DataFrame.to_pickle()
The to_pickle()
method converts an object in memory to a byte stream. This object can be stored as a binary file and read back in later.
The syntax for this method is as follows:
DataFrame.to_pickle(path, compression='infer', protocol=5, storage_options=None)
Parameter | Description |
---|---|
path | This parameter is the file path where the pickle file saves. |
compression | If 'infer' , options are: ‘.gz’, ‘.bz2’, ‘.zip’, ‘.xz’ , or ‘.zst’ ext. |
protocol | This parameter is an integer that stipulates the protocol to use. Options are 0-5. Click here for additional details. |
storage_options | This parameter is a dictionary containing additional details such as a host or port. |
This example reads in the finxters.csv
file to a DataFrame. The contents of this DataFrame saves to a pickle file.
Click here to save this CSV file and move it to the current working directory.
df_users = pd.read_csv('finxters.csv', usecols=['FID', 'Username', 'Password']) df_users.to_pickle('pickle_file')
- Line [1] reads in three (3) columns from the
finxters.csv
file. The output saves to a DataFrame (df_users
). - Line [2] saves the contents of the DataFrame to a pickle file.
💡 Note: Navigate to the current working directory to see this file located in the file list.
To learn how to read in a pickle file, click here for details.
Further Learning Resources
This is Part 21 of the DataFrame method series.