In this article, I’ll explain the
np.linspace function, how to use it and when you should. It has got a bit of a reputation for being complicated but, as you’ll see, it really isn’t! So, let’s get a quick overview first.
numpy.linspace(start, stop, num=50, endpoint=True, retstep=False, dtype=None, axis=0)
|–||The starting value of the sequence.|
|–||The ending value of the sequence.|
|The number of samples to generate. Must be non-negative (you can’t generate a number of samples less than zero!).|
|Whether to return a step value in the calculation. Step is the distance between each value.|
Return value: Per default, the function returns a NumPy array of evenly-distributed samples between
stop. But if you set
retstep = True, it’ll also return the
|samples||NumPy array of samples in the interval |
|step||Numerical value giving the space between two samples (only if |
Let’s look at the three most common arguments in more detail first:
Here’s what the official NumPy docs has to say:
numpy.linspace(start, stop, num=50)
Return evenly spaced numbers over a specified interval. Returns num evenly-spaced samples. The endpoint of the interval can optionally be excluded.
Note: as the name suggests,
np.linspace returns numbers that are linearly-spaced apart. Thus they are all the same distance apart from one another (think of points on a line).
From the definition, it follows that
np.linspace(-3, 3) will give us 50 numbers evenly spaced apart in the interval
Let’s check this with some code.
Try it yourself: You can run the code in the shell by clicking “Run”!
Exercise: Can you reduce the number of samples to 10?
>>> A = np.linspace(-3, 3) >>> type(A) numpy.ndarray # Number of elements in A >>> len(A) 50 # First element of A >>> A -3.0 # Last element of A >>> A[-1] 3.0 # The difference between every value is the same: 0.12244898 >>> np.diff(A) array([0.12244898, 0.12244898, 0.12244898, 0.12244898, 0.12244898, 0.12244898, 0.12244898, 0.12244898, 0.12244898, 0.12244898, 0.12244898, 0.12244898, 0.12244898, 0.12244898, 0.12244898, 0.12244898, 0.12244898, 0.12244898, 0.12244898, 0.12244898, 0.12244898, 0.12244898, 0.12244898, 0.12244898, 0.12244898, 0.12244898, 0.12244898, 0.12244898, 0.12244898, 0.12244898, 0.12244898, 0.12244898, 0.12244898, 0.12244898, 0.12244898, 0.12244898, 0.12244898, 0.12244898, 0.12244898, 0.12244898, 0.12244898, 0.12244898, 0.12244898, 0.12244898, 0.12244898, 0.12244898, 0.12244898, 0.12244898, 0.12244898])
If we want only 10 samples between -3 and 3, we set
>>> B = np.linspace(-3, 3, num=10) # B only contains 10 elements now >>> len(B) 10
Let’s define a simple function:
def f(x): return x*(x-2)*(x+2)
If you remember your high school maths, you’ll know that this is a positive cubic that intersects the x-axis at 0, 2 and -2. Thus, the area of interest is on the x-axis from (-3, 3).
Now we plot it using the same
np.linspace() as above (renamed for greater readability).
x_values = np.linspace(-3, 3) plt.plot(x_values, f(x_values)) # Add labels plt.title('Line Plot of f(x) Using np.linspace') plt.xlabel('x') plt.ylabel('f(x)') plt.show()
np.linspace returns a NumPy array, we can apply entire functions to them element-wise. This makes them super easy to work with.
Note 2: I’ve left out the code adding titles and axis labels from now on to save space.
To see what’s happening on a deeper level, let’s make a scatter plot of the same data.
plt.scatter(x_values, f(x_values)) plt.show()
Now let’s look at what happens if you don’t use np.linspace().
np.linspace vs np.arange
You may have encountered a similar function to
np.arange. As the name suggests, it returns a range of values between the given start and stop values.
Let’s see what happens if we replace
np.arange in our code above:
x_values = np.arange(-3, 3) plt.plot(x_values, f(x_values)) plt.show()
What’s happened? Let’s draw a scatter plot and see what’s happening in more detail.
Looking at that and what
np.arange() returns, we see the problem.
>>> np.arange(-3, 3) array([-3, -2, -1, 0, 1, 2])
We only get six x-values, spaced one integer apart and we don’t even get 3 included at the end! Since we need a large number of x-values for our line plot to look smooth, this is not good enough.
Can’t we solve this by setting the step to something other than 1, say to 0.1? We can but the NumPy docs explicitly recommend against doing so as this leads to inconsistencies between results. The reasons for this are outside the scope of this article. It’s best practice to use
np.linspace and your older self will thank you if you build good habits now.
You may want to plot a function of more than one variable such as
def g(x, y): return (x - y)**3 * (3*x**2 + y)
In this case, you don’t just need
np.linspace but also
np.meshgrid. Short explanation: if your function is N dimensional,
np.meshgrid will take N
np.linspace functions as input.
All Arguments Explained
Here are all possible arguments and their defaults for
np.linspace(start, stop, num=50, endpoint=True, restep=False, dtype=0, axis=0)
start, stop – array-like
The starting and ending value of the sequence respectively. You can pass lists or arrays to get many linear spaces inside one array. These can be accessed through normal NumPy slicing.
# Linear spaces [1-4], [2-4] and [3-4] in one array >>> np.linspace([1, 2, 3], 4, num=5) array([[1. , 2. , 3. ], [1.75, 2.5 , 3.25], [2.5 , 3. , 3.5 ], [3.25, 3.5 , 3.75], [4. , 4. , 4. ]]) # Linear spaces [1-4], [2-5] and [3-6] in one array >>> np.linspace([1, 2, 3], [4, 5, 6], num=5) array([[1. , 2. , 3. ], [1.75, 2.75, 3.75], [2.5 , 3.5 , 4.5 ], [3.25, 4.25, 5.25], [4. , 5. , 6. ]])
num – int, default 50
The number of samples to generate. Must be non-negative (you can’t generate a number of samples less than zero!).
endpoint – bool, default True
True, the endpoint is included in the sample, if
False it isn’t.
retstep – bool, default False
Whether to return a step value in the calculation. Step is the distance between each value.
np.linspace returns (samples, step) as a tuple.
>>> sample, step = np.linspace(1, 2, num=5, retstep=True) >>> sample array([1. , 1.25, 1.5 , 1.75, 2. ]) >>> step 0.25
dtype – dtype, default None
dtype of all elements in the output array (remember NumPy arrays only contain elements of one type!).
dtype=str, all values will be strings, likewise if
dtype=int, all values will be integers.
Being honest, I can’t think of many cases when you would want to use this functionality. Usually, you will use np.linspace to create an array of floats between two numbers. If you want to create an array of ints,
np.arange is much better. Firstly, its default setting is to return an array of ints. Secondly, it acts like the built-in python
range() function you already know and love! But if you come up with some use cases of this please let me know in the comments!
axis – int, default 0
stop is array like, we can set the axis long which we will store the samples.
# Store the 50 samples on the rows (default behaviour) >>> np.linspace([1, 2, 3], 4, axis=0).shape (50, 3) # Store the 50 samples along the columns >>> np.linspace([1, 2, 3], 4, axis=1).shape (3, 50)
And that’s all for the
np.linspace function! You now know almost everything there is to know! It wasn’t that bad after all was it?
If you have any questions please put them in the comments and I’ll get back to you as soon as I can!
If you liked this and are wondering if NumPy has different but similar functions then the answer is yes! Below are some to check out:
- np.geomspace – numbers are spaced evenly on a log scale (geometric progression)
- np.logspace – similar to geomspace but the endpoints are specified as logarithms
This article is contributed by Finxter user Adam Murphy (data scientist):
I am a self-taught programmer with a First Class degree in Mathematics from Durham University and have been coding since June 2019.
I am well versed in the fundamentals of web scraping and data science and can get you a wide variety of information from the web very quickly.
I recently scraped information about all watches that Breitling and Rolex sell in just 48 hours and am confident I can deliver datasets of similar quality to you whatever your needs.
Being a native English speaker, my communication skills are excellent and I am available to answer any questions you have and will provide regular updates on the progress of my work.
If you want to hire Adam, check out his Upwork profile!
Where to Go From Here?
A thorough understanding of the NumPy basics is an important part of any data scientist’s education. NumPy is at the heart of many advanced machine learning and data science libraries such as Pandas, TensorFlow, and Scikit-learn.
If you struggle with the NumPy library — fear not! Become a NumPy professional in no time with our new coding textbook “Coffee Break NumPy”. It’s not only a thorough introduction into the NumPy library that will increase your value to the marketplace. It’s also fun to go through the large collection of code puzzles in the book.