π‘ Problem Formulation: Polynomial regression is applied when data points form a non-linear relationship. This article outlines how to model this relationship using Python. For instance, given a set of data points, we aim to find a polynomial equation that best fits the trend. The desired output is the equation coefficients and a predictive model.
Method 1: Use NumPy for Polynomial Regression
NumPy is a fundamental package for scientific computing in Python that includes a method to fit a polynomial of a specified degree to data. While it’s not specialized for regression models, it can be used to obtain a quick solution.
Here’s an example:
import numpy as np import matplotlib.pyplot as plt # Dataset x = np.array([1, 2, 3, 4, 5]) y = np.array([1, 4, 9, 16, 25]) # Fit a 2nd degree polynomial p = np.polyfit(x, y, 2) # Create the polynomial equation f = np.poly1d(p) # Plot the data and the fitted curve plt.scatter(x, y, color='red') plt.plot(x, f(x), '--') plt.show()
The output is a plotted graph illustrating both the data points and the fitted polynomial regression line.
The code snippet first imports NumPy and Matplotlib. We then create arrays for our x and y values, fitting a second-degree polynomial to the data with np.polyfit()
. Lastly, it uses np.poly1d()
to generate the polynomial equation and plots it against the original data.
Method 2: Scikit-learn Polynomial Features with Linear Regression
Scikit-learn is a machine learning library for Python, which provides a PolynomialFeatures transformer to be used in conjunction with linear regression for polynomial regression.
Here’s an example:
from sklearn.preprocessing import PolynomialFeatures from sklearn.linear_model import LinearRegression import numpy as np # Dataset x = np.array([1, 2, 3, 4, 5]).reshape((-1, 1)) y = np.array([1, 4, 9, 16, 25]) # Transform to polynomial features poly = PolynomialFeatures(degree=2) x_poly = poly.fit_transform(x) # Perform linear regression model = LinearRegression().fit(x_poly, y) # Get coefficients print(model.coef_) print(model.intercept_)
The output displays the coefficients of the polynomial equation and the intercept derived from the regression analysis.
This method utilizes Scikit-learn’s PolynomialFeatures to extend our feature matrix for a polynomial fit. A LinearRegression model is then applied to fit the extended features. This allows for a straightforward application of regression algorithms to polynomial problems.
Method 3: Utilize StatsModels for Detailed Regression Analysis
StatsModels is a Python module that provides classes and functions for the estimation of many different statistical models. It gives detailed outputs regarding regression analysis.
Here’s an example:
import numpy as np import statsmodels.api as sm # Dataset x = np.array([1, 2, 3, 4, 5]) y = np.array([1, 4, 9, 16, 25]) # Add the polynomial feature x = sm.add_constant(np.column_stack((x**2, x))) # Create the model and fit it model = sm.OLS(y, x).fit() # Print the summary print(model.summary())
The output is a detailed table containing statistics like the coefficient values, t-stats, p-values, and R-squared value for the model.
In this example, StatsModelsβ OLS function is used to fit a simple linear model, extended by the polynomial feature (square of the x-values). It provides a summary output with a wealth of statistical information, ideal for more in-depth analysis.
Method 4: TensorFlow and Keras for Deep Learning Polynomial Regression
TensorFlow and Keras provide powerful platforms for building and training machine learning models, including regression models that can capture non-linear relationships as polynomials.
Here’s an example:
import numpy as np import tensorflow as tf from tensorflow import keras from tensorflow.keras.layers import Dense from tensorflow.keras.models import Sequential # Dataset x = np.array([1, 2, 3, 4, 5], dtype=float) y = np.array([1, 4, 9, 16, 25], dtype=float) # Build a Sequential model model = Sequential([Dense(units=1, input_shape=[1])]) # Compile the model model.compile(optimizer='sgd', loss='mean_squared_error') # Train the model model.fit(x, y, epochs=500) # Predict print(model.predict([6]))
The output is a prediction for the y-value when x is 6, after the model training.
This code builds a simple sequential model with a single dense layer, which is effectively using linear regression. By increasing the complexity of the neural network and the number of epochs, we can make it act as a polynomial regression model.
Bonus One-Liner Method 5: Use SymPy for Symbolic Polynomial Regression
SymPy is a Python library for symbolic mathematics. It can be used to find symbolic regression models, including polynomials.
Here’s an example:
from sympy import symbols, fit # Symbolic variable x = symbols('x') # Dataset points = [(1, 1), (2, 4), (3, 9), (4, 16), (5, 25)] # Fit a second-degree polynomial p = fit(points, x, 2) print(p)
The output is the symbolic representation of the polynomial equation that fits the given points.
This one-liner leverages SymPyβs fit()
function to calculate a symbolic polynomial equation fitting the provided data points directly.
Summary/Discussion
- Method 1: NumPy. Quick and easy. Not designed for complex regression tasks or datasets with lots of features.
- Method 2: Scikit-learn. Combines well with other Scikit-learn features. Less efficient for high-degree polynomials.
- Method 3: StatsModels. Offers extensive statistical insights. More verbose and less intuitive than other libraries.
- Method 4: TensorFlow and Keras. Scales well to very complex models and datasets. Requires more knowledge about neural networks and training.
- Method 5: SymPy. Offers symbolic polynomial equations. Not suitable for large or noisy datasets.