This article deals with the analysis and processing of signals, more specifically on how to identify and calculate the peaks contained in a given signal.
Being able to identify and hence work with the peaks of a signal is of fundamental importance in lots of different fields, from electronics to data science and economics.
When we talk about peaks, we are not just referring to the peaks of an electric signal, even the maxima or minima in a mathematical function are regarded as peaks. Bearing this in mind, we all know the importance of having a fast and reliable method that could allow us determining the position and value of the maxima and minima in a function; is it just for solving a mathematical exercise or for predicting economics trend, the number of application is enormous.
Code Example Peak Finding and Plotting
We herein exploit the function
.find_peaks() from the
Scipy.singnal library, to process a specific signal/function and extract the position and intensity of multiple peaks.
import numpy as np import matplotlib.pyplot as plt from scipy.signal import find_peaks #defining the x and y arrays x = np.linspace(0,10, 100) y = x*np.random.randn(100)**2 #Find peaks peaks = find_peaks(y, height = 1, threshold = 1, distance = 1) height = peaks['peak_heights'] #list of the heights of the peaks peak_pos = x[peaks] #list of the peaks positions #Finding the minima y2 = y*-1 minima = find_peaks(y2) min_pos = x[minima] #list of the minima positions min_height = y2[minima] #list of the mirrored minima heights #Plotting fig = plt.figure() ax = fig.subplots() ax.plot(x,y) ax.scatter(peak_pos, height, color = 'r', s = 15, marker = 'D', label = 'Maxima') ax.scatter(min_pos, min_height*-1, color = 'gold', s = 15, marker = 'X', label = 'Minima') ax.legend() ax.grid() plt.show()
Let’s dive into this code step-by-step!
Importing the Needed Python Libraries
Let’s start our script by importing the Python libraries that will be then used in the script.
import numpy as np from scipy.signal import find_peaks import matplotlib.pyplot as plt
Creating a Function with Peaks
The first thing that we have to do is to create a function, which should present some peaks.
This means to create the “x” and “y” arrays that will be then processed and plotted in our script.
- We start by using the
.linspace()function from Numpy, to define the
xarray, we call it “x”; it consists of an array of 100 equally spaced numbers.
- To generate the
yarray, we make use of the function
.randn()from the random package (from Numpy too), which returns a sample from a standard distribution (see additional documentation here: https://numpy.org/devdocs/reference/random/generated/numpy.random.randn.html), we just have to specify as an input parameter, the size of the generated array, in this case, we have to match the length of the x array, so 100.
We then modify a little bit more this array, by squaring its elements and multiplying them for the respective elements of the “x” array. The following code lines describe what has been explained so far.
#x and y arrays x = np.linspace(0, 10, 100) y = x*np.random.randn(100)**2
Finding the Peaks of the Function
Once determined the
y arrays, the next step is to identify the peaks positions and their value.
To do this, we exploit the function
.find_peaks(), which belongs to the package
.signal of the Scipy library (additional documentation can be found here: https://docs.scipy.org/doc/scipy/reference/generated/scipy.signal.find_peaks.html). The only mandatory input for this function is the signal that we are interested in. However, the function features a lot of interesting options that can help us to refine our processing task:
- Height: it can be a number or an array and it is used to specify the minimal height that a peak should have, in order to be identified;
- Threshold: is the required vertical distance between a peak and its neighboring, very useful in the case of noisy functions where we want to avoid selecting peaks from the noise;
- Distance: is the required minimal horizontal distance between neighboring peaks; it can be really useful in cases in which we have some knowledge about the periodicity of the peaks.
There are then many other options that we can exploit, for example for specifying the minimal width of the peaks etc…
The output of the
.find_peaks() function is an array that contains the indexes of each peak that has been identified. It can return also other information, in the case we had previously specified some options like “height” or “threshold” at the moment of the call.
In that case, the function returns an array of arrays, the first subarray still contains the indexes of the peaks, the others may present the heights of the found peaks or their left and right thresholds (and all the other information that were previously specified as optional input during the call of the function), as a dictionary.
After this brief explanation, let’s see in the following code lines how to call the function and thus finding the peaks.
#Find peaks peaks = find_peaks(y, height = 1, threshold = 1, distance = 1) height = peaks['peak_heights'] #list containing the height of the peaks peak_pos = x[peaks] #list containing the positions of the peaks
As can be seen in the code lines above, we gave as input the “y” array and then we specified some other optional parameters (I set them all equal to 1 since I did not know what the aspect of my function was; I only knew that all the numbers were positive, since the function is squared).
Since we specified the optional parameters “height”, the output of the function (“peaks”), consists of an array, the first element is a subarray containing the positions of the peaks, the second subarray is a dictionary which contain all the information specified in the optional input parameters given at the moment of the call.
We can exploit this powerful feature for extracting the heights of the peaks; the only thing to do is to define an array, “height”, which will be equal to the array contained at the dictionary key “peak_heights”.
We can then create an array containing the positions of the peaks along the x array by exploiting the first subarray of the “peaks” array, i.e.
peaks and use it as index of our “x” array. In this way we can store in an array called “peak_pos”, just the positions of the points, along the “x” array, corresponding to peaks. The arrays “height” and “peak_pos” are the ones that will be used to plot the peaks on the initial function.
What about the minima?
So far, we have seen how to identify the position and calculate the height of our peaks. For some application we might be interested in analyzing also the minima (or the trough) of our signals. The following lines will demonstrate an easy strategy to accomplish this task.
.find_peaks() is only able to spot and analyze the peaks of a function; to solve this problem we have “to trick” the function by changing the input signal.
One practical way to do this is to mirror our signal; if we mirror a function with respect to the horizontal axis, the points that corresponded to its minima will be then converted into its new maxima or peaks.
After that, we can just repeat the procedure explained in the previous paragraph. To mirror the function, we can just multiply the “y” array by -1 and store its value in a new array called “y2”. This time, when calling the function
.find_peaks(), we will not specify the “height” option, since the height of these peaks can correspond to negative numbers (in principle we are not sure what the mirrored minima will look like). We can leave all the other optional parameters, if we want to refine the analysis (I left them all equal to 1).
#Find minima y2 = y*-1 minima = find_peaks(y2, threshold = 1, distance = 1) min_pos = x[minima] #list containing the positions of the minima min_height = y2[minima] #list containing the height of the minima
As you can see, this time for obtaining the heights of the minima, we just indexed the “y2” array with the array containing the indexes of the peaks (the real mirrored minima of the original function “y”) and stored them in the array “min_height”. At this point, we have also the information about the minima of the original function, we just have to remember to mirror them again at the moment of plotting their value.
To see the result of our peak analysis, we now plot the original function, the peaks and the minima. The function is plotted as a continuous line while the peaks and the minima as single points (hence a scatterplot). Maxima/peaks will be plotted in red, using a diamond as a marker; on the other hand, minima are plotted in yellow, with a cross symbol. We finish our plot by adding the legend and the grid. The following code lines describe the just explained procedure.
#Plotting the function + peaks and minima fig = plt.figure() ax = fig.subplots() ax.plot(x,y) ax.scatter(peak_pos, height, color = 'r', s = 10, marker = 'D', label = 'maxima') ax.scatter(min_pos, min_height*-1, color = 'gold', s = 10, marker = 'X', label = 'minima') ax.legend() ax.grid() plt.show()
The final result is instead displayed in Figure 1.
Figure 1: Initial function (blue curve) with the identified peaks (the maxima, red diamonds) and minima (yellow crosses).
As can be seen from Figure 1, we have successfully identified most the maxima/peaks and the minima of the initial function. Some minor peaks have not been taken into account in the analysis; if we were also interested in those ones, we should tune the optional parameters like the threshold and the height and iterate multiple times the same procedure.