5 Best Ways to Implement Polynomial Regression in Python

πŸ’‘ Problem Formulation: Polynomial regression is applied when data points form a non-linear relationship. This article outlines how to model this relationship using Python. For instance, given a set of data points, we aim to find a polynomial equation that best fits the trend. The desired output is the equation coefficients and a predictive model.

Method 1: Use NumPy for Polynomial Regression

NumPy is a fundamental package for scientific computing in Python that includes a method to fit a polynomial of a specified degree to data. While it’s not specialized for regression models, it can be used to obtain a quick solution.

Here’s an example:

import numpy as np
import matplotlib.pyplot as plt

# Dataset
x = np.array([1, 2, 3, 4, 5])
y = np.array([1, 4, 9, 16, 25])

# Fit a 2nd degree polynomial
p = np.polyfit(x, y, 2)

# Create the polynomial equation
f = np.poly1d(p)

# Plot the data and the fitted curve
plt.scatter(x, y, color='red')
plt.plot(x, f(x), '--')
plt.show()

The output is a plotted graph illustrating both the data points and the fitted polynomial regression line.

The code snippet first imports NumPy and Matplotlib. We then create arrays for our x and y values, fitting a second-degree polynomial to the data with np.polyfit(). Lastly, it uses np.poly1d() to generate the polynomial equation and plots it against the original data.

Method 2: Scikit-learn Polynomial Features with Linear Regression

Scikit-learn is a machine learning library for Python, which provides a PolynomialFeatures transformer to be used in conjunction with linear regression for polynomial regression.

Here’s an example:

from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
import numpy as np

# Dataset
x = np.array([1, 2, 3, 4, 5]).reshape((-1, 1))
y = np.array([1, 4, 9, 16, 25])

# Transform to polynomial features
poly = PolynomialFeatures(degree=2)
x_poly = poly.fit_transform(x)

# Perform linear regression
model = LinearRegression().fit(x_poly, y)

# Get coefficients
print(model.coef_)
print(model.intercept_)

The output displays the coefficients of the polynomial equation and the intercept derived from the regression analysis.

This method utilizes Scikit-learn’s PolynomialFeatures to extend our feature matrix for a polynomial fit. A LinearRegression model is then applied to fit the extended features. This allows for a straightforward application of regression algorithms to polynomial problems.

Method 3: Utilize StatsModels for Detailed Regression Analysis

StatsModels is a Python module that provides classes and functions for the estimation of many different statistical models. It gives detailed outputs regarding regression analysis.

Here’s an example:

import numpy as np
import statsmodels.api as sm

# Dataset
x = np.array([1, 2, 3, 4, 5])
y = np.array([1, 4, 9, 16, 25])

# Add the polynomial feature
x = sm.add_constant(np.column_stack((x**2, x)))

# Create the model and fit it
model = sm.OLS(y, x).fit()

# Print the summary
print(model.summary())

The output is a detailed table containing statistics like the coefficient values, t-stats, p-values, and R-squared value for the model.

In this example, StatsModels’ OLS function is used to fit a simple linear model, extended by the polynomial feature (square of the x-values). It provides a summary output with a wealth of statistical information, ideal for more in-depth analysis.

Method 4: TensorFlow and Keras for Deep Learning Polynomial Regression

TensorFlow and Keras provide powerful platforms for building and training machine learning models, including regression models that can capture non-linear relationships as polynomials.

Here’s an example:

import numpy as np
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.layers import Dense
from tensorflow.keras.models import Sequential

# Dataset
x = np.array([1, 2, 3, 4, 5], dtype=float)
y = np.array([1, 4, 9, 16, 25], dtype=float)

# Build a Sequential model
model = Sequential([Dense(units=1, input_shape=[1])])

# Compile the model
model.compile(optimizer='sgd', loss='mean_squared_error')

# Train the model
model.fit(x, y, epochs=500)

# Predict
print(model.predict([6]))

The output is a prediction for the y-value when x is 6, after the model training.

This code builds a simple sequential model with a single dense layer, which is effectively using linear regression. By increasing the complexity of the neural network and the number of epochs, we can make it act as a polynomial regression model.

Bonus One-Liner Method 5: Use SymPy for Symbolic Polynomial Regression

SymPy is a Python library for symbolic mathematics. It can be used to find symbolic regression models, including polynomials.

Here’s an example:

from sympy import symbols, fit

# Symbolic variable
x = symbols('x')

# Dataset
points = [(1, 1), (2, 4), (3, 9), (4, 16), (5, 25)]

# Fit a second-degree polynomial
p = fit(points, x, 2)

print(p)

The output is the symbolic representation of the polynomial equation that fits the given points.

This one-liner leverages SymPy’s fit() function to calculate a symbolic polynomial equation fitting the provided data points directly.

Summary/Discussion

  • Method 1: NumPy. Quick and easy. Not designed for complex regression tasks or datasets with lots of features.
  • Method 2: Scikit-learn. Combines well with other Scikit-learn features. Less efficient for high-degree polynomials.
  • Method 3: StatsModels. Offers extensive statistical insights. More verbose and less intuitive than other libraries.
  • Method 4: TensorFlow and Keras. Scales well to very complex models and datasets. Requires more knowledge about neural networks and training.
  • Method 5: SymPy. Offers symbolic polynomial equations. Not suitable for large or noisy datasets.