In this tutorial, you will learn βΒ
- How to create a basic linear regression model
- How to save and load an ML model using Pickle module
- How to save and load an ML model using Joblib module
Background and Motivation
Over the past years, Machine Learning (ML) has grown in importance with easy access to data and increasing computational power. Better ML models help to determine future events and decipher consumer trends with greater precision.
For example, Scikit-learn and Keras are ML models that help in the diagnosis and detection of skin cancer with high accuracy and likewise, regression and time series are widely used in demand forecasting.
ML models undergo multiple iterations to reach the desired level which can provide results with greater accuracy. It requires a considerable amount of time and resources to develop a model.
Consider a situation where for every scenario, one has to build an ML model from scratch. Do you think it would be even fruitful to consider the ML option for predictions?
Solution Overview
Python has a solution where through its varied modules, one can easily save and load ML models at a later stage to predict an outcome.
In this article, we will study two different methods of saving and loading our ML models using Python
- Pickle
- Joblib
Packages required –Β follow our installation guides:
Method 1: Pickle
In Python, the object structure is serialized and deserialized by the Pickle module through binary protocols.Β
βΉοΈ Info: Pickling is the process in which an object hierarchy is converted into a byte stream, and unpickling is the exact reverse where the stored byte stream is reconverted into an object.
To demonstrate Pickle module versatility, we will perform below steps:
- Build a plain vanilla linear regression model
- Save the model (Serialization or pickeling)
- Load the saved model (Deserialization or unpickeling)
Step 1: Load all required packages from sklearn and pickle
# Import Packages from sklearn.datasets import load_wine from sklearn.model_selection import train_test_split from sklearn.linear_model import LinearRegression import pickle
Step 2: Load data set
There is a list of inbuilt datasets that comes with the scikit-learn module such as the wine data set.
# Load Wine Data Wine_data = load_wine()
Step 3: Split wine data
# Split data into test and train datasets X_train, X_test, Y_train, Y_test = train_test_split(Wine_data.data, Wine_data.target, test_size=0.33, random_state=3, stratify = Wine_data.target)
Function train_test_split
will split the wine dataset into two different sets, train, and test. The model will utilize a train set to find the optimal weights for regression. The test set provides an unbiased measure of the model’s effectiveness.
Test size can be any value between 0 to 1. Value 0.33 suggests that 33% will be utilized as test data and 67% for training the model.
Random state indicates shuffling of data while building the model
Stratify takes into account the frequency of training data
Step 4: Create and Train Linear Regression Model
# Initiate Linear Regression and train the model lreg = LinearRegression().fit(X_train,Y_train)
Step 5: Evaluate R squared for the train and the test data
# Evaluate R squared for train and test data print(str(lreg.score(X_train,Y_train))) print(str(lreg.score(X_test,Y_test))) =================================================================== RESTART: /Users/mayankchandra/Documents/Python/ML_Save_trial.py ================================================================== 0.914045280377521 0.855120280462724
Shell returned the above values for R square, which denotes fitment of the model. The Better the R score, the better the fitment.
P.S. Since the training dataset was utilized to build the model, it has a better score than the test dataset
Step 6: Save the model using Pickle
# Save model using dump function of pickle pck_file = "Pck_LR_Model.pkl" with open(pck_file, 'wb') as file: pickle.dump(lreg, file)
The dump function ensures that the linear regression model is saved in the pickle file.
Pickle file 'Pck_LR_Model.pkl'
will be stored in the current working directory.
Step 7: Load the model and evaluate R squared
# Reload model using load function of pickle with open(pck_file,'rb') as file: Pickled_LR = pickle.load(file) # Validate the R sqaured value of test data, it should be same of the original model print(str(Pickled_LR.score(X_test,Y_test)))
The load
function will reload the model in the Pickled_LR
object.
Method 2: Joblib
Joblib provides specific optimizations utilized for lightweight pipelining in Python. It works efficiently on large data Python objects.
Now letβs see how Joblib saves our existing Linear Regression model and reloads it at a later stage for future use.
Step 1: Import joblib
# Import joblib import joblib
Step 2: Save the model using the dump() function
# Save linear regression model using joblib jlib_file = "Jlib_LR_Model.pkl" joblib.dump(lreg,jlib_file)
Step 3: Reload the model using the load function
# Reload model using load function of joblib Joblib_LR = joblib.load(jlib_file) # Validate the R sqaured value of test data, it should be same of the original model print(str(Joblib_LR.score(X_test,Y_test)))
Summary
In this article, we learned two methods to save and load an ML model
- Pickle: Serialize and deserialize Python objects
- Joblib: Efficiently compresses large data Python objects