scipy.interpolate.interp(1D, 2D, 3D)
In this article we will explore how to perform interpolations in Python, using the Scipy library.
Scipy provides a lot of useful functions which allows for mathematical processing and optimization of the data analysis. More specifically, speaking about interpolating data, it provides some useful functions for obtaining a rapid and accurate interpolation, starting from a set of known data points. In the following text, we will analyze three different interpolation scenarios; one-dimensional interpolation two and three-dimensional interpolation.
The functions that will be used in the code snippets are taken from the
scipy.interpolate library, and are:
From a mathematical point of view, interpolation indicates the process of obtaining the value of specific unknown data points that are located between some other known data points, after having described the known set of data points with an opportune function.
For example, if we have a series of data points x0, x1, x2,…xn and we know the values y0, y1, y2,…yn (with yn = f(xn)), through the process of interpolation, we can determine the value ym = f(xm), where xm is a point located in between two of the already known points, i.e. when x0 < xm < xn. This can be done by first calculating the function that best describes the trend of our known data points and then by evaluating the value of that function in specific unknown points. Of course, all this procedure is performed automatically by our terminal; we only receive as output the values of the points that we are interested in. With this being said, I hope I convinced you that interpolation represents a powerful tool for data analysis, for making predictions and for a lot of other different applications.
The following paragraphs explain how to perform an interpolation when dealing with 1, 2 or 3-dimensional data sets. To do that, we will rely on the Python library Scipy, more specifically on one of its packages called
interpolate which provide the function
.interp() to perform in an easy and immediate way this task.
Let’s begin by first importing the function that will be used to perform the interpolation.
As already introduced, the function is called
interpolate.interp1d() and belongs to the Scipy package. Since we will use different interpolating functions for each dimensions (all of them belonging to
.interpolate), we will just import
.interpolate from the Scipy library. First of all, we need to create a data set that will be used to show the interpolation process. We will do this, defining an x array (using the Numpy function
.linspace()) of ten equally spaced numbers, ranging from 0 to 100. The y array, instead, will be defined by the following equation:
import numpy as np import matplotlib.pyplot as plt from scipy import interpolate #defining x and y arrays of the initial data set x = np.linspace(0, 100,10) y = 3*x**2 – np.exp(0.1*x)
Since the process of interpolation allows for obtaining the value of unknown points located within the range of the already known ones, we now define another x array that will contain more points than the first x array (“x”). In particular, we exploit again
.linspace() to build an array of 100 equally spaced numbers. We then call this array “x_new”.
# x array that will be used for interpolating new point values x_new = np.linspace(0, 100, 100)
At this point, we can already interpolate our initial data set and obtain the values of the new points, that we have stored in the “x_new” array. To do that, we exploit the
.interpolate.interp1d() function; which takes as mandatory inputs the x and y arrays in which are stored the values of the known data points and returns as output the interpolating function with which we can then obtain the values of unknown points. Another optional but very important input that can be specified to the
.interp1d() function is “kind”, which specifies the type of function that will be used in the interpolating process. There are multiple “kind” options, they are:
kind = ['linear', 'nearest', 'zero', 'slinear', 'quadratic', 'cubic', 'previous', 'next']
The most widely used are
'cubic', which refer to a spline interpolation of zeroth, first, second or third order, respectively.
'next' simply return the previous or next value of the point (please refer to https://docs.scipy.org/doc/scipy/reference/generated/scipy.interpolate.interp1d.html for the complete documentation on
In order to see all these different interpolating functions plotted together, we can exploit a for loop and iterate the process of interpolation and plotting of the data points, as showed in the code snippet below.
kind = ['linear', 'nearest', 'zero', 'slinear', 'quadratic', 'cubic', 'previous', 'next'] fig = plt.figure() ax = fig.subplots() for i in kind: #interpolation step f = interpolate.interp1d(x, y, kind = i) #y array that contains the interpolated data points y_interp = f(x_new) ax.plot(x_new, y_interp, alpha = 0.5, label = i) ax.scatter(x,y) plt.legend() plt.show()
As you can see in the code snippet, within the for loop, we make the interpolation by calling the function
.interp1d() and giving as inputs the x and y array defined at the beginning of the paragraph; the interpolating function is then assigned to the variable “f”. At each iteration step, the “kind” of interpolation will change, selecting along the different kinds contained in the list “kind”. To finally obtain the values of the unknown points, contained within the array “x_new”, we define the array “y_interp” by applying the just calculated interpolating function “f” to the “x_new” array. The final result is displayed in Figure 1.
Figure 1: Different interpolating functions (kinds). The blue dots are the initial, known data points; as can be seen, through the process of interpolation we are now able to obtain the values of all those points located in between the blue ones.
It is important to stress that the only known points from which we derived all the plots shown in Figure 1, are the blue ones (ten points). Through the process of interpolation, we have obtained the value of all the points that are located in between the range of these ten data points. In general, when interpolating a given data set, it is important to gain as more information as possible on the distribution of the known data points; this helps to understand which “kind” of interpolating function will yield the best results. However, in most of the cases, the quadratic and cubic interpolation are the ones yielding the best results, as you can see, they are superimposed for almost all the data points.
Now that we have introduced the interpolation procedure on one-dimensional data sets, it’s time to apply the same thing in two dimensions. As you will see, the procedure is very similar; this time, the function that will be used is called
Since we are dealing with two-dimensional data points, in order to plot them, we need to create a grid of points and then to assign a specific value to all the points on the grid; these will be our initial, known data points from which we interpolate the values of other data points.
To build our grid of points, we firstly define an x and y arrays (called “x” and “y”) by using
.linspace(); this time, the points on our grid will be 13 and will range from zero to four. To define a grid from these two arrays, we use the Numpy function
.meshgrid(). The following code snippet describe the creation of the grid.
x = np.linspace(0, 4, 13) y = np.linspace(0, 4, 13) X, Y = np.meshgrid(x, y)
To complete the definition of our initial set of data points, we have to assign a specific value to all the couples (x,y) of points on the grid. To do that, we define a new array called Z, which depends on the values of X and Y (the points of the grid) and is defined by the following equation:
Z = np.arccos(-np.cos(2*X) * np.cos(2*Y))
Similarly to what we did in the one-dimensional case, we now define a new and denser grid that contains the points that will be interpolated from the (X, Y) values. The 65 points of this new grid still range from 0 to four and are stored in the “x2” and “y2” array. The process is the same as the one used for defining the first grid.
#denser grid of points that we want to interpolate x2 = np.linspace(0, 4, 65) y2 = np.linspace(0, 4, 65) X2, Y2 = np.meshgrid(x2, y2)
The next step is the interpolation; we call the function
.interp2d() and assign its output (the interpolating function) to the variable “f”. Also in the two-dimensional case, we can chose which “kind” of interpolating function to use in the process, this time there are only three options, “linear”, “cubic” and “quantic”, which describe the type of splines used in the interpolation (to know more on the concept of splines, please refer to https://en.wikipedia.org/wiki/Spline_(mathematics) ). We finally assign to the variable Z2, the values of the interpolated points that we previously store in the x2 and y2 arrays. The following lines of code describe the interpolation process.
#interpolation f = interpolate.interp2d(x, y, z, kind = ‘cubic’) Z2 = f(x2, y2)
With this step, we completed the 2-D interpolation, and we can hence plot the results in order to have a graphical representation of what has been done by the function. For a better understanding of the interpolation process in two dimensions, we plot both the initial 13×13 grid (left) and the 65×65 interpolated one (right).
Our plots will display the grids of points and will describe the value of each (x,y) couple with a color scale. To achieve such a result, we can exploit the Matplotlib function
.pcolormesh() which allows creating a pseudocolor plot with a non-regular rectangular grid (https://matplotlib.org/3.1.1/api/_as_gen/matplotlib.pyplot.pcolormesh.html ).
#Plotting fig = plt.figure() ax = fig.subplots(1,2) ax.pcolormesh(X, Y, Z) ax.pcolormesh(X2, Y2, Z2) plt.show()
The final result is displayed in Figure 2:
Figure 2: Result of .interp2d(); starting from a 13×13 grid (left), we can interpolate the values assigned to each (x, y) couple and obtain the values of the couples of points along a 65×65 grid (right).
As you can see from Figure 2, through the process of 2D interpolation, we have densified the first grid by interpolating the value of additional points contained within the range of the initial grid points.
We conclude this article with the last interpolation, we increase again the dimensions and tackle the three-dimensional case. To accomplish this task, we exploit the function .interpn(), which can be used, more generally, for multidimensional interpolations on regular grids (more documentation can be found here https://docs.scipy.org/doc/scipy/reference/generated/scipy.interpolate.interpn.html ); this means that we can use this function to perform interpolations on data with n dimensions, with n that can be even higher than 3.
Similarly to the other cases, we start our code by defining the arrays that will constitute our 3D grid, this time we will need three equal arrays, called “x”, “y”, “z”. We then store all of them within a tuple called “points” which will come handy later on. Moreover, we define the 3D grid, using again
#arrays constituting the 3D grid x = np.linspace(0, 50, 50) y = np.linspace(0, 50, 50) z = np.linspace(0, 50, 50) points = (x, y, z) #generate a 3D grid X, Y, Z = np.meshgrid(x, y, z)
At this points we have to assign a value to all the triples of (x, y, z) points on the grid; to do that we define the function “func_3d(x,y,z)”, which for a specific set of x,y and z values, returns the expression:
As you can see, the function depends on three independent variables. The values of all the (x, y, z) triples will be stored in the array “values”, defines by calling the function “func_3d” on all the X, Y, Z points.
#evaluate the function on the points of the grid values = func_3d(X, Y, Z)
Since it would not be possible to plot the created grid (it would result in a four-dimensional plot); we just define an array containing the triples of points that we want to interpolate in the form of lists. In our case, we will perform the interpolation just on a single triple, defined in the array “point”.
point = np.array([2.5, 3.5, 1.5])
We now call the
.interpn() function to perform the interpolation. Differently from the previous two functions,
.interpn() does not have the option “kind”, but instead it presents the one called “method”; the default value is “linear”. The inputs of this function are the tuple containing all the three arrays that made up the initial 3D grid (namely “x”, “y” and “z”, stored in the tuple “points”), the values assigned to each triple (stored in the array “values”) and the array containing the coordinates of the points in which we want to perform the interpolation (in our case, just one point, whose coordinates are stored in “point”). We include all this into a “print” command in order to directly obtain the result of the interpolation:
# points = the regular grid, #values =the data on the regular grid # point = the point that we want to evaluate in the 3D grid print(interpolate.interpn(points, values, point))
The final result is 13.0; which is the interpolated value for the point of coordinates (2.5, 3.5, 1.5).