Python Pandas Input/Output – Pickling

If you are leaning towards a career as a Data Scientist or just a coder looking to expand their skill set, the art of pickling is a must-have. This article focuses on creating, saving, and reading various object types to/from a pickle file.

Syntax

pandas.read_pickle(filepath_or_buffer, compression='infer', storage_options=None)

The return value is an unpickled object of the same data type as the object stored in the initial pickle file.

Background

A Python’s pickling file serializes and de-serializes object structures. Any Python object can be pickled, saved to a file, and recovered at a later date. For example, a user is taking a quiz but needs a break. Their information saves to a pickle file. This action enables the user to start back seamlessly where they left off.

If you need to work with a pickle file across various languages/platforms, a pickle file is not the way to go. The pickle file is strictly Python and version-specific.

Note: Pickle files may contain malicious data. Be very careful to load a pickle file from a trusted source. 

The data types a pickle object accept are:

Getting Started

Remember to add the Required Starter Code to the top of each code snippet. This snippet will allow the code in this article to run error-free.

Required Starter Code:

import pandas as pd
import pickle

Install Required Libraries

Before any data manipulation can occur, two new libraries will require installation. The first library (pandas) enables access to/from a DataFrame. The second library (pickle) provides access to read/save pickle files and much more.

To install these libraries, navigate to an IDE terminal. At the command prompt ($), execute the code below for each installation. For the terminal used in this example, the command prompt is a dollar sign ($). Your terminal prompt may be different.

Code:

$ pip install pandas

Hit the <Enter> key on the keyboard to start the installation process.

Code:

$ pip install pickle

Hit the <Enter> key on the keyboard to start the installation process.

If the installations are successful, a message displays in the terminal indicating the same.

Save Dictionary to Pickle File

Expanding on the example above, a new user signs up for a quiz on the Finxter Academy website. This quiz contains 25 questions. The user can take as long as needed to complete. They can start/stop whenever they want. What a great place to use a pickle file!

The pickle file can save the details as shown in the dictionary below. The next time the user restarts the quiz, they could (with additional coding) be placed at the correct quiz position (question 7) and continue.

quiz_dct = {'finxter1042': {1: 'A', 2: 'E', 3: 'B', 4: 'D', 5: 'A', 6: 'E'}}
data = pd.DataFrame(quiz_dct)
data.to_pickle('quiz.pkl')
print(data)
  • Line [1] creates a dictionary for user finxter1042, containing the quiz questions answered to date.
  • Line [2] converts this dictionary to a DataFrame and assigns it to data.
  • Line [3] writes the DataFrame to quiz.pkl and places it in the current working directory.

Output:

Note: Two additional parameters are available:

Compression: If not passed as a parameter, infer is assumed. The available options are:

  • gzip
  • bg2
  • zip
  • xz
  • None

Protocol: This is an integer that indicates which protocol should be used by the pickler. By default, the HIGHEST_PROTOCOL is 4. The possible values are 0-4.

Read Dictionary Pickle File to DataFrame

The pandas.read_pickle() function loads (reads) in pickled pandas files. For this example, the pickle file loads and saves to another pickle file. To perform these tasks, run the following code:

udf = pd.read_pickle('quiz.pkl')
udf.to_pickle('finxter1042.pkl')
print(udf)
  • Line [1] unpickles and loads (reads) the existing pickle file and assigns it to the DataFrame udf.
  • Line [2] saves a copy of the DataFrame to finxter1042.pkl.
  • Line [3] outputs the contents of udf to the terminal.

Output:

finxter1042
1A
2E
3B
4A
5D
6E

Save Tuple of Tuples to Pickle File

For this example, we have a Tuple of Tuples that contain Student IDs and their respective Grade. Run the code below to create the pickle file.

Note: Using the dump() function is another way to save a pickle file.

std_grades = ((1042, 98), (1043, 51), (1044, 87), (1045, 65))
tuplefile  = open('grades.pkl', 'wb')
pickle.dump(std_grades, tuplefile)
tuplefile.close()
  • Line [1] declares a tuple of tuples containing two elements each: Student ID and Grade.
  • Line [2] opens a grades.pkl file for writing.
  • Line [3] passes two parameters to the dump() function: the tuples and the pickle filename. This file saves to the current working directory.

Output:

Read Tuple of Tuples Pickle File to DataFrame

To read in the pickle file created above and assign it to a DataFrame, run the following code:

pickle_in = open('grades.pkl','rb')
data_in   = pickle.load(pickle_in)
df = pd.DataFrame(data_in, columns=['SID', 'Grade'])
print(df)
  • Line [1] reads in the pickle file created earlier.
  • Line [2] loads in the contents and assign them to data_in.
  • Line [3] creates a DataFrame, and two columns display as headings for the tuple.
  • Line [4] outputs the DataFrame to the terminal.

Output:

SIDGrade
104298
104351
104487
104565