np.arange() function appears in 21% of the 35 million Github repositories that use the NumPy library! This illustrated tutorial shows you the ins and outs of the NumPy arange function. So let’s get started!
What’s the NumPy Arange Function?
np.arange([start,] stop[, step]) function creates a new NumPy array with evenly-spaced integers between
start (inclusive) and
stop (exclusive). The
step size defines the difference between subsequent values. For example,
np.arange(1, 6, 2) creates the NumPy array
[1, 3, 5].
Have a look at the following graphic:
Let’s explore these examples in the following code snippet that shows the four most important uses of the NumPy arange function:
import numpy as np # np.arange(stop) >>> np.arange(10) array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) # np.arange(start, stop) >>> np.arange(2, 10) array([2, 3, 4, 5, 6, 7, 8, 9]) # np.arange(start, stop, step) >>> np.arange(2, 10, 2) array([2, 4, 6, 8]) # np.arange(start, stop, step, dtype) >>> np.arange(2, 10, 2, float) array([2., 4., 6., 8.])
The examples show all four variants of using the NumPy arange function with one, two, three, or four arguments (parameters). Take the time to study all four cases by watching the video or read the rest of the article.
[Reading time: 4 minutes] – Or watch the video.
To master the NumPy arange function, read over the following basic function calls with different sets of arguments.
The NumPy arange function (commonly misspelled: NumPy arrange) creates a NumPy array of evenly spaced numbers within a fixed interval.
One Argument: numpy.arange(stop)
Here’s the most basic example of the NumPy arange function. You only specify the stop argument.
Let’s say, you want to represent the seven weekdays with seven sequential numbers — 0: Monday, 1: Tuesday, 2: Wednesday, 3: Thursday, 4: Friday, 5: Saturday, 6: Sunday. So your NumPy array will start with index 0 and end with index 6.
Similar to many other sequence operations,
np.arange() starts at index 0. If you specify one argument, it is the same as setting the start argument to 0.
Here is an example of the NumPy arange function with a single argument:
>>> np.arange(7) array([0, 1, 2, 3, 4, 5, 6])
The result is a NumPy array with seven elements starting from the implicitly chosen index 0 (inclusive) and ending in the explicitly chosen index 7 (exclusive).
But what if we want to define a start index? Let’s say you feel anxious every Sunday evening because you hate going to work on Mondays for a big accountancy firm. First, you should get a new job as a programmer! Second, to stop working Mondays, we need to skip the start index 0.
Two Arguments: numpy.arange(start, stop)
Your boss at the big accountancy firm is nice and has agreed to let you take every Monday off. Yay! Now we need to update the company records.
If you add a second argument to np.arange(), Python interprets the first one as the start index and the second one as the end index.
Here is an example of the NumPy arange function with two arguments:
>>> np.arange(1, 7) array([1, 2, 3, 4, 5, 6])
Ok, we mastered the two-argument version as well. Let’s move on to the three-argument version.
Three Arguments: np.arange(start, stop, step)
You have been studying Python programming every Monday for a few months and have just landed your first freelancing gig! Congratulations! You now have enough money to quit your boring accountancy job. Yay! Your new boss has asked you to fill in the days you want to work.
You love starting your week on Tuesdays. But you’d also like more time off. So you say you’ll work every second day. Luckily, you have a solid salary as a programmer, so this isn’t a problem!
To do this, add a third argument to np.arange to set the step size. It is 2 in this case because you work every second day.
>>> np.arange(1, 7, 2) array([1, 3, 5])
Well done, now you only work on Tuesdays, Thursdays, and Saturdays! All programmers are lazy and you are lazy! It’s a perfect job for you.
But wait, there’s a problem… let’s move on to the four-argument function call to solve it.
Four Arguments: np.arange(start, stop, step, dtype)
As you put your working days
[1, 3, 5] into your company’s tracking system, it complains. Your NumPy array is formatted incorrectly. It expects all NumPy array values to be floats rather than integers. “What a design flaw for weekday data!”.
Thankfully, there is a simple fix. We need to use the fourth argument of
np.arange() to set the data type (dype) of the output array. The
dtype argument accepts two kinds of data types. First, traditional language-specific data types such as float and integer. Second, NumPy-specific data types such as
For instance, the NumPy-specific data types
np.float32 allow for an integer value with 16 bits (=2 bytes) or a float value with 32 bits (=4 bytes). Keep in mind that more bits leads to higher overheads. But it gives you a greater range of numbers to work with (or greater precision in the case of floats). Here is a collection of dtypes you can use (check out this excellent post if you need more information about NumPy dtypes):
bool: The default boolean data type in Python (1 Byte).
int: The default Python integer data type in Python (4 or 8 Bytes).
float: The default float data type in Python (8 Bytes).
complex: The default complex data type in Python (16 Bytes).
np.int8: Integer (1 Byte).
np.int16: Integer (2 Bytes).
np.int32: Integer (4 Bytes).
np.int64: Integer (8 Bytes).
np.float16: Float (2 Bytes).
np.float32: Float (4 Bytes).
np.float64: Float (8 Bytes).
By default, NumPy chooses
np.int64 for floats and integers. So only specify a different
dtype if you want something other than those. At the start of your Python journey, it’s unlikely you will need to deeply understand the different dtypes. Once you start working on more complex problems, you will need this knowledge though.
Lastly, note that you can only spot differences once numbers get large. The largest
np.int8 is 127 but the largest
np.int16 is 32767. Yet, if you compare
np.int8 127 and
np.int16 127, they are the same.
>>> np.int8(127) == np.int16(127) True
Here is an example of the 4-argument version of
>>> np.arange(1, 7, 2, dtype=np.float32) array([1., 3., 5.], dtype=float32)
You’ve completed the first part of the NumPy arange tutorial! But the one who prepares best wins. Let’s dive into some practice examples and attack the highest level of NumPy arange expertise!
You have mastered all four different uses of the NumPy arange function. To wrap things up, make sure to work through this more comprehensive list of examples:
>>> np.arange(10) array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) >>> np.arange(2, 10, 3) array([2, 5, 8]) # Non-integer step sizes np.arange(1.5, 10., 1.5) array([1.5, 3., 4.5, 6., 7.5, 9. ]) # Create np.arange in reverse with a negative step size # Start is inclusive, end is exclusive so it stops at 2 >>> np.arange(10, 1, -1) array([10, 9, 8, 7, 6, 5, 4, 3, 2]) # Reversed and select every second value >>> np.arange(10, 1, -2) array([10, 8, 6, 4, 2]) # Reversed non-integer step size >>> np.arange(9.7, 9.2, -0.1) array([9.7, 9.6, 9.5, 9.4, 9.3]) # End being exclusive beats the start being inclusive >>> np.arange(1, 1, 1) array(, dtype=int64) >>> np.arange(1, 5, dtype=np.int16) array([1, 2, 3, 4], dtype=int16) >>> np.arange(1, 5, dtype=np.float32) array([1., 2., 3., 4.], dtype=float32)
I‘ve also added several NumPy arange puzzles to my puzzle-based learning app Finxter.com. Check it out to train yourself and become a master coder.
Finxter App: Test your skills now!
Congratulations, you now know the most important details about the NumPy arange function. But you may still have a few questions. Let’s answer them one by one!
np.arange vs np.linspace – When Should I Use Which One?
Use np.arange() if you want to create integer sequences with evenly distributed integer values within a fixed interval.
Use np.linspace() if you have a non-integer step size. The NumPy linspace function handles the endpoints better. This prevents you from introducing unnecessary bugs into your code. One such bug is when you assume an endpoint is not in the NumPy array but it is because of floating-point arithmetic.
If you want to understand the NumPy linspace() function in detail, check out our blog article.
What Are Some np.arange() Use Cases?
NumPy is mostly about multi-dimensional matrices. It is common to create a 1D NumPy array with the NumPy arange function and to transform it immediately into a 2D array using the np.reshape() function. Below we create a 2D array with three rows and two columns from a 1D array.
np.arange(6).reshape((3, 2)) # array([[0, 1], # [2, 3], # [4, 5]])
np.arange vs range – What’s the Difference?
You’re probably familiar with the built-in
range() function. We use it all the time to write for loops and list comprehensions.
>>> for i in range(5): print(i) 0 1 2 3 4 # Square numbers from 0-4 >>> squares = [i**2 for i in range(5)] >>> squares [0, 1, 4, 9, 16]
So can we do the same with
np.arange? And even if we can, should we?
Let’s first see if it’s possible.
# Works the same as range() >>> for i in np.arange(5): print(i) 0 1 2 3 4 # Also works the same as range() >>> np_squares = [i**2 for i in np.arange(5)] >>> np_squares [0, 1, 4, 9, 16] # They're even identical! >>> np_squares == squares True
So it seems like we can do the same with both functions. But we’re working with a tiny amount of data. What happens if we work with much larger numbers?
np.arange vs range – Working with Big Data
We’ll use iPython’s magic function
%timeit to see how long our for loops take when using
# Only works in iPython In : %timeit for i in np.arange(1000000): pass 78.1 ms ± 4.9 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) In : %timeit for i in range(1000000): pass 32.8 ms ± 899 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) # range is more than twice as fast! In : 78.1 / 32.8 Out: 2.381...
This happens because the two functions work differently. The function
range() is a generator function. It creates the next value when it needs to and stops once it has run out. This makes it very efficient. But
np.arange creates an array of length 1,000,000 and stores it in memory. This is computationally expensive for Python and slows it down. Thus, if you are ever looping, you should use
range(). It’s much faster.
However, there is a time when you should not use range.
Iterating Over Large NumPy Arrays
Let’s say you have a NumPy array, A, containing 1 million values. You want to create a new array, B, by performing a calculation on each element of A. For this example, we will add 1 to every element of A to get B.
First, we create an array of 1 million random numbers using the random module and a list comprehension. Check out our article to learn more about this built-in library.
>>> import random # Set seed so we can reproduce our results >>> random.seed(1) # Use list comprehension to generate random numbers >>> A = [random.random() for i in range(1000000)] # Convert to a numpy array >>> A = np.array(A) # Check first 5 values (you should have the same as me) >>> A[:5] [0.13436424411240122, 0.8474337369372327, 0.763774618976614, 0.2550690257394217, 0.49543508709194095]
Let’s first create B using a for loop:
>>> B = np.array() >>> for i in A: B.append(i + 1) # Check first 5 values - looks good >>> B[:5] [1.134364244112401, 1.8474337369372327, 1.7637746189766141, 1.2550690257394217, 1.4954350870919408]
Let’s time it:
# Only works in iPython In : B = np.array() # NumPy arrays don't have an .append() method so we use np.append() In : %timeit for i in A: np.append(B, i+1) 6.91 s ± 628 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
That is rather slow. Can we make this any faster? As NumPy arrays are made up of lists, we can do this using a list comprehension. Let’s see if this improves the speed.
In : %timeit B = np.array([i+1 for i in A]) 600 ms ± 19.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Wow! That’s 11.5x faster than using a for loop! But can we get even faster? Yes, we can and the answer is: vectorised computation.
Most of the time, the operation you want to perform can be done using functions. You should always use these as they have been optimised by the NumPy developers. Check out this Stack Overflow answer for an introduction. We can, and probably will, write a full article on this topic soon. But for now, check out Chapter 4 from Python for Data Analysis by Wes McKinney.
The fastest way to solve this is to use NumPy’s broadcasting property i.e. if we +1 to a NumPy array it ‘broadcasts’ this to all the elements of the array.
In : %timeit B = A + 1 1.99 ms ± 386 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)
This is 3,455x faster than using a for loop. It will take some time to get used to using vectorised computations. But once you get used to it, you will save a lot of time.
np.arange() – A Short Summary
First, import the NumPy library:
import numpy as np
Now, you can use the NumPy arange function to create sequences with equal step sizes in various ways. Go over the table and study the examples thoroughly:
|Example||Resulting NumPy Array|
Where to Go From Here?
How to join the top earners in any field? Read more books!
Because I always find it difficult to find time to learn, I have written a new NumPy book that can be entirely consumed in small doses — for example as you drink your daily morning coffee.
Don’t miss out on this exciting new way of learning to code — it’s so much more fun! 🙂