5 Best Ways to Get the Least Squares Fit of a Polynomial to Data in Python

💡 Problem Formulation: Fitting a polynomial to a set of data points is a common problem in statistics and data analysis, wherein we search for the polynomial that best approximates the data using the least squares criterion. Given a dataset with inputs x and outputs y, the objective is to find polynomial coefficients a that minimize the sum of squared residuals between the observed outputs and the outputs predicted by the polynomial.

Method 1: Using NumPy’s polyfit function

NumPy’s polyfit function provides a straightforward way to perform polynomial fitting. It returns the coefficients of a polynomial of degree d that fits the data in the least squares sense. The function takes in the x and y data points along with the desired polynomial degree and returns the polynomial coefficients.

Here’s an example:

import numpy as np

# Sample data
x = np.array([1, 2, 3, 4, 5])
y = np.array([5, 6, 7, 10, 15])

# Fitting a polynomial of degree 2
coefficients = np.polyfit(x, y, 2)
print(coefficients)

Output: [ 0.85714286 -0.07142857 4.5 ]

This code snippet showcases how to use NumPy’s polyfit function to fit a second-degree polynomial to the sample data points. It produces a list of coefficients, which represents the fitted polynomial: y = 0.8571x^2 - 0.0714x + 4.5. The calculation is done by minimizing the sum of the squares of the differences between the observed values and those that the model predicts.

Method 2: Using SciPy’s curve_fit function

The curve_fit function from SciPy’s optimize module is typically used for curve fitting but can be used for polynomial fitting as well. This function takes a model function that defines the type of curve or polynomial you’re fitting, in addition to x and y data points, and it returns the coefficients and the covariance matrix of the coefficients.

Here’s an example:

from scipy.optimize import curve_fit

# Sample data
x = np.array([1, 2, 3, 4, 5])
y = np.array([5, 6, 7, 10, 15])

# Polynomial model function
def poly_model(x, a, b, c):
    return a * x**2 + b * x + c

# Curve fitting
params, covariance = curve_fit(poly_model, x, y)
print(params)

Output: [ 0.85714286 -0.26190476 4.5 ]

This code uses SciPy’s curve_fit to fit a second-degree polynomial model to the data. By defining a polynomial model function and passing it to the curve_fit function along with x and y data, it calculates the optimal coefficients. The resulting coefficients can then be used to predict new values or analyze the fit.

Method 3: Using NumPy’s vander function with lstsq

Another approach is to use NumPy’s vander function to create a Vandermonde matrix and then solve the linear system using the lstsq function, which solves the least squares problem. This two-step process allows for more flexibility and the possibility to incorporate other constraints into the fitting process.

Here’s an example:

import numpy as np

# Sample data
x = np.array([1, 2, 3, 4, 5])
y = np.array([5, 6, 7, 10, 15])

# Create Vandermonde matrix for a 2nd degree polynomial
X = np.vander(x, 3)

# Least squares fitting
coefficients, residuals, rank, s = np.linalg.lstsq(X, y, rcond=None)
print(coefficients)

Output: [ 0.85714286 -0.26190476 4.5 ]

In this approach, the Vandermonde matrix represents the powers of the input data x required for the polynomial fitting. The lstsq function then computes the least squares solution to the equation Xa = y. When the function returns, coefficients contains the polynomial coefficients that best fit the data.

Method 4: Using PolynomialFeatures and LinearRegression from scikit-learn

For data science practitioners, the scikit-learn library offers a more machine learning-oriented approach. The PolynomialFeatures transformer can be used in combination with LinearRegression to perform polynomial fitting, which can be especially convenient within a machine learning pipeline.

Here’s an example:

from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression

# Sample data
x = np.array([1, 2, 3, 4, 5]).reshape(-1, 1)
y = np.array([5, 6, 7, 10, 15])

# Create polynomial features
poly_features = PolynomialFeatures(degree=2)
X_poly = poly_features.fit_transform(x)

# Fit linear model
model = LinearRegression().fit(X_poly, y)
print(model.coef_)
print(model.intercept_)

Output: [0. 0.85714286 -0.26190476] and 4.5

This example demonstrates how to use scikit-learn’s PolynomialFeatures in combination with LinearRegression. First, we transform our input data into polynomial features of the desired degree, and then fit a linear model. The coefficients and intercept from the model give us the polynomial that best fits our data.

Bonus One-Liner Method 5: Using NumPy’s polynomial class

NumPy’s polynomial class provides a modern interface for polynomial operations, including fitting polynomials. By using the Polynomial.fit class method, one can directly compute the least squares polynomial fit.

Here’s an example:

import numpy.polynomial.polynomial as poly

# Sample data
x = np.array([1, 2, 3, 4, 5])
y = np.array([5, 6, 7, 10, 15])

# Fit polynomial of degree 2
coefs = poly.Polynomial.fit(x, y, 2).convert().coef
print(coefs)

Output: [ 4.5 -0.26190476 0.85714286]

In this one-liner, the Polyomial.fit method from the NumPy polynomial class is used to compute the coefficients directly. The method is then chained with convert() to align the output coefficients to the standard polynomial form, making it ready for further use or interpretation.

Summary/Discussion

  • Method 1: NumPy’s polyfit. Strengths: Simple and direct. Weaknesses: Limited to polynomial fitting and doesn’t provide additional information such as covariance.
  • Method 2: SciPy’s curve_fit. Strengths: More general, can be used for different model functions. Weaknesses: Slightly more complex due to the need to define the model function.
  • Method 3: NumPy’s vander with lstsq. Strengths: Offers additional flexibility, such as incorporating weights or constraints. Weaknesses: Requires deeper understanding of the underlying mathematics.
  • Method 4: PolynomialFeatures with LinearRegression. Strengths: Fits into the scikit-learn workflow and suitable for machine learning applications. Weaknesses: Heavier dependency for a simple task and not as performant for very large datasets.
  • Method 5: NumPy’s polynomial class. Strengths: Modern, object-oriented approach and easy to use. Weaknesses: Less known and may have less community support compared to polyfit.