If you are leaning towards a career as a Data Scientist or just a coder looking to expand your skillset, the art of pickling is a must-have. This article focuses on creating, saving, and reading various object types to/from a pickle file.
Syntax
pandas.read_pickle(filepath_or_buffer, compression='infer', storage_options=None)
The return value is an unpickled object of the same data type as the object stored in the initial pickle file.
Background
A Python’s pickling file serializes and de-serializes object structures. Any Python object can be pickled, saved to a file, and recovered at a later date.
For example, a user is taking a quiz but needs a break. Their information saves to a pickle file. This action enables the user to start back seamlessly where they left off.
If you need to work with a pickle file across various languages/platforms, a pickle file is not the way to go. The pickle file is strictly Python and version-specific.
π‘Note: Pickle files may contain malicious data. Be very careful to load a pickle file from a trusted source.
The data types a pickle object accepts are:
- Dictionaries (used in this article)
- Tuples (used in this article)
- Lists
- Boolean, Integers, Floats, Strings, and more
Preparation
Before any data manipulation can occur, two (2) new libraries will require installation.
- The Pandas library enables access to/from a DataFrame.
- The Pickle library allows reading/writing to/from a Pickle file.
To install these libraries, navigate to an IDE terminal. At the command prompt ($
), execute the code below. For the terminal used in this example, the command prompt is a dollar sign ($
). Your terminal prompt may be different.
$ pip install pandas
Hit the <Enter>
key on the keyboard to start the installation process.
$ pip install pickle
Hit the <Enter>
key on the keyboard to start the installation process.
If the installations were successful, a message displays in the terminal indicating the same.
Feel free to view the PyCharm installation guide for the required libraries.
- How to install Pandas on PyCharm
- How to install Pickle on PyCharm
Add the following code to the top of each code snippet. This snippet will allow the code in this article to run error-free.
import pandas as pd import pickle
Save Dictionary to Pickle File
Expanding on the example above, a new user signs up for a quiz on the Finxter Academy website. This quiz contains 25 questions. The user can take as long as needed to complete. They can start/stop whenever they want. What a great place to use a pickle file!
The pickle file can save the details to a dictionary. Then, the next time the user restarts the quiz, they could (with additional coding) be placed at the correct quiz position (question 7) and continue.
quiz_dct = {'finxter1042': {1: 'A', 2: 'E', 3: 'B', 4: 'D', 5: 'A', 6: 'E'}} data = pd.DataFrame(quiz_dct) data.to_pickle('quiz.pkl') print(data)
- Line [1] creates a dictionary for user finxter1042, containing the quiz questions answered to date.
- Line [2] converts this dictionary to a DataFrame and assigns it to data.
- Line [3] writes the DataFrame to quiz.pkl and places it in the current working directory.
Output
Two (2) additional parameters are available:
Compression: If not passed as a parameter, infer is assumed. The available options are:
- gzip
- bg2
- zip
- xz
- None
Protocol: This is an integer that indicates which protocol should be used by the pickler. By default, the HIGHEST_PROTOCOL
is 4. Therefore, the possible values are 0-4.
Read Dictionary Pickle File to DataFrame
The pandas.read_pickle()
function loads (reads) in pickled pandas files. Then, the pickle file saves to a new pickle file.
To perform this task, run the following code:
udf = pd.read_pickle('quiz.pkl') udf.to_pickle('finxter1042.pkl') print(udf)
- Line [1] unpickles and loads (reads) the existing pickle file and assigns it to the DataFrame
udf
. - Line [2] saves a copy of the DataFrame to
finxter1042.pkl
. - Line [3] outputs the contents of
udf
to the terminal.
Output
finxter1042 | |
1 | A |
2 | E |
3 | B |
4 | A |
5 | D |
6 | E |
Save Tuple of Tuples to Pickle File
For this example, we have a Tuple of Tuples that contains Student IDs and their respective Grade. Run the code below to create the pickle file.
π‘Note: Using the dump()
function is another way to save a pickle file.
std_grades = ((1042, 98), (1043, 51), (1044, 87), (1045, 65)) tuplefile = open('grades.pkl', 'wb') pickle.dump(std_grades, tuplefile) tuplefile.close()
- Line [1] declares a tuple of tuples containing two elements each: Student ID and Grade.
- Line [2] opens a
grades.pkl
file for writing. - Line [3] passes two parameters to the
dump()
function: the tuples and the pickle filename. This file saves to the current working directory. - Line [4] closes the open file.
Output
Read Tuple of Tuples Pickle File to DataFrame
To read in the pickle file created above and assign it to a DataFrame, run the following code:
pickle_in = open('grades.pkl','rb') data_in = pickle.load(pickle_in) df = pd.DataFrame(data_in, columns=['SID', 'Grade']) print(df)
- Line [1] reads in the pickle file created earlier.
- Line [2] loads in the contents and assign them to
data_in
. - Line [3] creates a DataFrame, and two columns display as headings for the tuple.
- Line [4] outputs the DataFrame to the terminal.
Output
SID | Grade |
1042 | 98 |
1043 | 51 |
1044 | 87 |
1045 | 65 |