Fitting Polynomial Regression Models to Understand Non-linear Trends in Python

💡 Problem Formulation: In many real-world scenarios, data shows a non-linear relationship, wherein a straight line cannot effectively capture the trends present. To accurately model these trends, we rely on polynomial regression, which can fit curved lines to data points. For instance, input might be years of experience, and desired output could be the salary range, which often doesn’t scale linearly with experience.

Method 1: Use numpy and sklearn for Polynomial Feature Transformation

Using numpy for handling arrays and sklearn’s PolynomialFeatures for feature transformation allows for an effective approach to polynomial regression. This multi-stepped process includes generating a new feature matrix consisting of all polynomial combinations of the features with a degree less than or equal to the specified degree.

Here’s an example:

import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures

# Sample data
X = np.array([5, 15, 25, 35, 45, 55]).reshape((-1, 1))
y = np.array([15, 11, 2, 8, 25, 32])

# Transforming data
poly = PolynomialFeatures(degree=2, include_bias=False)
poly_features = poly.fit_transform(X)

# Fit the model
model = LinearRegression().fit(poly_features, y)

Output of the model coefficients:

[model.intercept_, model.coef_]

By transforming our input data with PolynomialFeatures, we extend our linear regression model to fit non-linear data. The output is the model’s coefficients, representing the intercept and the influence of each polynomial feature on the target variable.

Method 2: Define a Custom Polynomial Regression with numpy

Creating a custom polynomial regression model using numpy’s polyfit function allows for a direct approach. This function performs a least squares polynomial fit over the given data, returning the coefficients that minimize the squared error.

Here’s an example:

import numpy as np

# Sample data
X = [5, 15, 25, 35, 45, 55]
y = [15, 11, 2, 8, 25, 32]

# Fit the model
model_coefficients = np.polyfit(X, y, 2)

# Create a polynomial function with the fitted coefficients
model_poly = np.poly1d(model_coefficients)

Output of the model coefficients:

model_coefficients

This code snippet demonstrates how to fit a polynomial regression model by directly calculating the optimal coefficients. The np.poly1d function is used to create a polynomial function from the model coefficients. This method is concise and quite powerful for quick computations and analysis.

Method 3: Leveraging scipy for More Complex Curve Fitting

The scipy library offers advanced curve-fitting capabilities through its optimize module. Leveraging the curve_fit function allows you to fit your data to any kind of model, extending beyond polynomials to any custom function.

Here’s an example:

from scipy.optimize import curve_fit

# Sample data
X = np.array([5, 15, 25, 35, 45, 55])
y = np.array([15, 11, 2, 8, 25, 32])

# Define the polynomial function
def poly_function(x, a, b, c):
    return a * x**2 + b * x + c

# Fit the model
params, _ = curve_fit(poly_function, X, y)

Output of the model parameters:

params

The curve_fit function from scipy.optimize is used to fit our data to a polynomial model. This method is highly flexible and can be used for more complex relationships by defining your own function that describes the expected trend.

Method 4: Use statsmodels for Detailed Statistical Analysis

For those requiring detailed statistical analysis in their polynomial regression, statsmodels provides extensive summary statistics and diagnostics alongside the modeling capabilities. The Ordinary Least Squares (OLS) function can be used in conjunction with PolynomialFeatures for a rich output of information.

Here’s an example:

import statsmodels.api as sm

# Sample data
X = [5, 15, 25, 35, 45, 55]
y = [15, 11, 2, 8, 25, 32]

# Transforming data
poly_features = PolynomialFeatures(degree=2, include_bias=False).fit_transform(np.array(X).reshape(-1, 1))

# Fit the model
model = sm.OLS(y, sm.add_constant(poly_features)).fit()

Access the summary of the model:

print(model.summary())

This method uses statsmodels to perform polynomial regression, including comprehensive statistical summaries of the model’s performance. The OLS function is used after transforming the data into polynomial features. It’s particularly useful for an in-depth understanding of model diagnostics.

Bonus One-Liner Method 5: Quick Modeling with numpy

For a quick one-liner polynomial regression, numpy can be your tool of choice. It’s straight to the point and can be useful for simple polynomial regression tasks with minimal coding.

Here’s an example:

y_pred = np.poly1d(np.polyfit(X, y, 2))(X)

Output of predicted values:

y_pred

Using numpy’s polyfit in combination with poly1d, we have created a one-liner that fits the data and provides predictions in a compact form. This method is best suited when simplicity and brevity are required, and fewer diagnostics are needed.

Summary/Discussion

Method 1: Using numpy and sklearn’s PolynomialFeatures. Strengths: Robust and integrates seamlessly with sklearn’s model ecosystem. Weaknesses: Slightly verbose and requires additional transformation steps.
Method 2: Custom Polynomial Regression with numpy. Strengths: Direct and concise. Weaknesses: Limited to polynomial models and lacks integration with machine learning pipelines.
Method 3: Leveraging scipy for curve fitting. Strengths: Highly customizable and suited for complex models. Weaknesses: May require deeper mathematical understanding.
Method 4: Using statsmodels for a detailed statistical approach. Strengths: Provides in-depth analysis and diagnostics. Weaknesses: More verbose and can be overwhelming for beginners.
Method 5: Quick Modeling with numpy. Strengths: Extremely concise and quick for simple tasks. Weaknesses: Not suitable for complex problems and lacks detailed diagnostics.