# 5 Best Ways to Get the Least Squares Fit of a Polynomial to Data in Python

Rate this post

๐ก Problem Formulation: Fitting a polynomial to a set of data points is a common problem in statistics and data analysis, wherein we search for the polynomial that best approximates the data using the least squares criterion. Given a dataset with inputs `x` and outputs `y`, the objective is to find polynomial coefficients `a` that minimize the sum of squared residuals between the observed outputs and the outputs predicted by the polynomial.

## Method 1: Using NumPy’s polyfit function

NumPy’s `polyfit` function provides a straightforward way to perform polynomial fitting. It returns the coefficients of a polynomial of degree `d` that fits the data in the least squares sense. The function takes in the x and y data points along with the desired polynomial degree and returns the polynomial coefficients.

Here’s an example:

```import numpy as np

# Sample data
x = np.array([1, 2, 3, 4, 5])
y = np.array([5, 6, 7, 10, 15])

# Fitting a polynomial of degree 2
coefficients = np.polyfit(x, y, 2)
print(coefficients)```

Output: `[ 0.85714286 -0.07142857 4.5 ]`

This code snippet showcases how to use NumPy’s `polyfit` function to fit a second-degree polynomial to the sample data points. It produces a list of coefficients, which represents the fitted polynomial: `y = 0.8571x^2 - 0.0714x + 4.5`. The calculation is done by minimizing the sum of the squares of the differences between the observed values and those that the model predicts.

## Method 2: Using SciPyโs curve_fit function

The `curve_fit` function from SciPy’s optimize module is typically used for curve fitting but can be used for polynomial fitting as well. This function takes a model function that defines the type of curve or polynomial you’re fitting, in addition to x and y data points, and it returns the coefficients and the covariance matrix of the coefficients.

Here’s an example:

```from scipy.optimize import curve_fit

# Sample data
x = np.array([1, 2, 3, 4, 5])
y = np.array([5, 6, 7, 10, 15])

# Polynomial model function
def poly_model(x, a, b, c):
return a * x**2 + b * x + c

# Curve fitting
params, covariance = curve_fit(poly_model, x, y)
print(params)```

Output: `[ 0.85714286 -0.26190476 4.5 ]`

This code uses SciPy’s `curve_fit` to fit a second-degree polynomial model to the data. By defining a polynomial model function and passing it to the `curve_fit` function along with x and y data, it calculates the optimal coefficients. The resulting coefficients can then be used to predict new values or analyze the fit.

## Method 3: Using NumPyโs vander function with lstsq

Another approach is to use NumPy’s `vander` function to create a Vandermonde matrix and then solve the linear system using the `lstsq` function, which solves the least squares problem. This two-step process allows for more flexibility and the possibility to incorporate other constraints into the fitting process.

Here’s an example:

```import numpy as np

# Sample data
x = np.array([1, 2, 3, 4, 5])
y = np.array([5, 6, 7, 10, 15])

# Create Vandermonde matrix for a 2nd degree polynomial
X = np.vander(x, 3)

# Least squares fitting
coefficients, residuals, rank, s = np.linalg.lstsq(X, y, rcond=None)
print(coefficients)```

Output: `[ 0.85714286 -0.26190476 4.5 ]`

In this approach, the Vandermonde matrix represents the powers of the input data `x` required for the polynomial fitting. The `lstsq` function then computes the least squares solution to the equation `Xa = y`. When the function returns, `coefficients` contains the polynomial coefficients that best fit the data.

## Method 4: Using PolynomialFeatures and LinearRegression from scikit-learn

For data science practitioners, the scikit-learn library offers a more machine learning-oriented approach. The PolynomialFeatures transformer can be used in combination with LinearRegression to perform polynomial fitting, which can be especially convenient within a machine learning pipeline.

Here’s an example:

```from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression

# Sample data
x = np.array([1, 2, 3, 4, 5]).reshape(-1, 1)
y = np.array([5, 6, 7, 10, 15])

# Create polynomial features
poly_features = PolynomialFeatures(degree=2)
X_poly = poly_features.fit_transform(x)

# Fit linear model
model = LinearRegression().fit(X_poly, y)
print(model.coef_)
print(model.intercept_)```

Output: `[0. 0.85714286 -0.26190476]` and `4.5`

This example demonstrates how to use scikit-learn’s PolynomialFeatures in combination with LinearRegression. First, we transform our input data into polynomial features of the desired degree, and then fit a linear model. The coefficients and intercept from the model give us the polynomial that best fits our data.

## Bonus One-Liner Method 5: Using NumPyโs polynomial class

NumPy’s polynomial class provides a modern interface for polynomial operations, including fitting polynomials. By using the Polynomial.fit class method, one can directly compute the least squares polynomial fit.

Here’s an example:

```import numpy.polynomial.polynomial as poly

# Sample data
x = np.array([1, 2, 3, 4, 5])
y = np.array([5, 6, 7, 10, 15])

# Fit polynomial of degree 2
coefs = poly.Polynomial.fit(x, y, 2).convert().coef
print(coefs)```

Output: `[ 4.5 -0.26190476 0.85714286]`

In this one-liner, the `Polyomial.fit` method from the NumPy polynomial class is used to compute the coefficients directly. The method is then chained with `convert()` to align the output coefficients to the standard polynomial form, making it ready for further use or interpretation.

## Summary/Discussion

• Method 1: NumPy’s polyfit. Strengths: Simple and direct. Weaknesses: Limited to polynomial fitting and doesn’t provide additional information such as covariance.
• Method 2: SciPyโs curve_fit. Strengths: More general, can be used for different model functions. Weaknesses: Slightly more complex due to the need to define the model function.
• Method 3: NumPyโs vander with lstsq. Strengths: Offers additional flexibility, such as incorporating weights or constraints. Weaknesses: Requires deeper understanding of the underlying mathematics.
• Method 4: PolynomialFeatures with LinearRegression. Strengths: Fits into the scikit-learn workflow and suitable for machine learning applications. Weaknesses: Heavier dependency for a simple task and not as performant for very large datasets.
• Method 5: NumPyโs polynomial class. Strengths: Modern, object-oriented approach and easy to use. Weaknesses: Less known and may have less community support compared to polyfit.