Python Pandas Input/Output – Pickling

If you are leaning towards a career as a Data Scientist or just a coder looking to expand your skillset, the art of pickling is a must-have. This article focuses on creating, saving, and reading various object types to/from a pickle file.

Syntax

pandas.read_pickle(filepath_or_buffer, compression='infer', storage_options=None)

The return value is an unpickled object of the same data type as the object stored in the initial pickle file.


Background

A Python’s pickling file serializes and de-serializes object structures. Any Python object can be pickled, saved to a file, and recovered at a later date.

For example, a user is taking a quiz but needs a break. Their information saves to a pickle file. This action enables the user to start back seamlessly where they left off.

If you need to work with a pickle file across various languages/platforms, a pickle file is not the way to go. The pickle file is strictly Python and version-specific.

πŸ’‘Note: Pickle files may contain malicious data. Be very careful to load a pickle file from a trusted source. 

The data types a pickle object accepts are:


Preparation

Before any data manipulation can occur, two (2) new libraries will require installation.

  • The Pandas library enables access to/from a DataFrame.
  • The Pickle library allows reading/writing to/from a Pickle file.

To install these libraries, navigate to an IDE terminal. At the command prompt ($), execute the code below. For the terminal used in this example, the command prompt is a dollar sign ($). Your terminal prompt may be different.

$ pip install pandas

Hit the <Enter> key on the keyboard to start the installation process.

$ pip install pickle

Hit the <Enter> key on the keyboard to start the installation process.

If the installations were successful, a message displays in the terminal indicating the same.


Feel free to view the PyCharm installation guide for the required libraries.


Add the following code to the top of each code snippet. This snippet will allow the code in this article to run error-free.

import pandas as pd
import pickle

Save Dictionary to Pickle File

Expanding on the example above, a new user signs up for a quiz on the Finxter Academy website. This quiz contains 25 questions. The user can take as long as needed to complete. They can start/stop whenever they want. What a great place to use a pickle file!

The pickle file can save the details to a dictionary. Then, the next time the user restarts the quiz, they could (with additional coding) be placed at the correct quiz position (question 7) and continue.

quiz_dct = {'finxter1042': {1: 'A', 2: 'E', 3: 'B', 4: 'D', 5: 'A', 6: 'E'}}
data = pd.DataFrame(quiz_dct)
data.to_pickle('quiz.pkl')
print(data)
  • Line [1] creates a dictionary for user finxter1042, containing the quiz questions answered to date.
  • Line [2] converts this dictionary to a DataFrame and assigns it to data.
  • Line [3] writes the DataFrame to quiz.pkl and places it in the current working directory.

Output

Two (2) additional parameters are available:

Compression: If not passed as a parameter, infer is assumed. The available options are:

  • gzip
  • bg2
  • zip
  • xz
  • None

Protocol: This is an integer that indicates which protocol should be used by the pickler. By default, the HIGHEST_PROTOCOL is 4. Therefore, the possible values are 0-4.


Read Dictionary Pickle File to DataFrame

The pandas.read_pickle() function loads (reads) in pickled pandas files. Then, the pickle file saves to a new pickle file.

To perform this task, run the following code:

udf = pd.read_pickle('quiz.pkl')
udf.to_pickle('finxter1042.pkl')
print(udf)
  • Line [1] unpickles and loads (reads) the existing pickle file and assigns it to the DataFrame udf.
  • Line [2] saves a copy of the DataFrame to finxter1042.pkl.
  • Line [3] outputs the contents of udf to the terminal.

Output

finxter1042
1A
2E
3B
4A
5D
6E

Save Tuple of Tuples to Pickle File

For this example, we have a Tuple of Tuples that contains Student IDs and their respective Grade. Run the code below to create the pickle file.

πŸ’‘Note: Using the dump() function is another way to save a pickle file.

std_grades = ((1042, 98), (1043, 51), (1044, 87), (1045, 65))
tuplefile  = open('grades.pkl', 'wb')
pickle.dump(std_grades, tuplefile)
tuplefile.close()
  • Line [1] declares a tuple of tuples containing two elements each: Student ID and Grade.
  • Line [2] opens a grades.pkl file for writing.
  • Line [3] passes two parameters to the dump() function: the tuples and the pickle filename. This file saves to the current working directory.
  • Line [4] closes the open file.

Output


Read Tuple of Tuples Pickle File to DataFrame

To read in the pickle file created above and assign it to a DataFrame, run the following code:

pickle_in = open('grades.pkl','rb')
data_in   = pickle.load(pickle_in)
df = pd.DataFrame(data_in, columns=['SID', 'Grade'])
print(df)
  • Line [1] reads in the pickle file created earlier.
  • Line [2] loads in the contents and assign them to data_in.
  • Line [3] creates a DataFrame, and two columns display as headings for the tuple.
  • Line [4] outputs the DataFrame to the terminal.

Output

SIDGrade
104298
104351
104487
104565