The Python built-in list data type is powerful. However, the NumPy array has many advantages over Python lists. What are they?
Advantages NumPy | Advantages Python Lists |
---|---|
Multi-dimensional Slicing | Library-Independent |
Broadcasting Functionality | Intuitive |
Processing Speed | Less Complicated |
Memory Footprint | Heterogeneous List Data Allowed |
Many Convenience Methods | Arbitrary Data Shape (Non-Square Matrix) |
Let’s dive into the most important advantages of NumPy arrays over Python lists.
1. More Powerful Slicing and Broadcasting Functionality.
In contrast to regular slicing, NumPy slicing is a bit more powerful. Here’s how NumPy handles an assignment of a value to an extended slice.
import numpy as np l = list(range(10)) l[::2] = 999 # Throws error --> assign iterable to extended slice a = np.arange(10) a[::2] = 999 print(a) # [999 1 999 3 999 5 999 7 999 9]
Regular Pythonβs slicing method is not able to implement the userβs intention as NumPy. In both cases, it is clear that the user wants to assign 999 to every other element in the slice. Numpy has no problems implementing this goal.
On top of that, NumPy can perform multi-dimensional slicing which is not convenient in Python.
import numpy as np a = np.arange(16) a = a.reshape((4,4)) print(a) # [ 0 1 2 3] # [ 4 5 6 7] # [ 8 9 10 11] # [12 13 14 15]] print(a[:, 1]) # Second column: # [ 1 5 9 13] print(a[1, :]) # Second row: # [4 5 6 7]
Try It Yourself: Execute this code snippet in the interactive Python shell in your browser.
If you want to master the fine but powerful features of NumPy and become a data science pro, check out my book “Coffee Break NumPy”.
2. More Efficient Data Representation.
NumPy arrays are much faster to access and create while having a smaller memory footprint. Need more proof?
import numpy as np import sys x = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] y = np.array(x) print(sys.getsizeof(x)) # 144 bytes print(sys.getsizeof(y)) # 136 bytes
Try It Yourself: Interestingly, this doesn’t seem to be true in all environments. If I run this in my browser, I get a different result. I suspect it’s because the performance differences have been reduced in Python 3.8.
Exercise: Execute this code snippet in the interactive Python shell in your browser.
The reduced memory footprint of a NumPy array becomes even more pronounced for larger data sets.
Check out this great resource where you can check the speed of NumPy arrays vs Python lists.
3. More Convenient.
This excellent StackOverflow answer provides a great example of how NumPy arrays are much more convenient in practice:
Read your data from a file and convert it to a three-dimensional cube:
x = numpy.fromfile(file=open("data"), dtype=float).reshape((10, 10, 10))
Find cells that are greater than a certain threshold 0.1:
(x > 0.1).nonzero()
Sum along the first dimension:
x.sum(axis=0)
All of those capabilities do simply not exist in Python lists. There’s no way of creating a multi-dimensional Python list in such a concise manner!
Try It Yourself: Execute this code snippet in the interactive Python shell in your browser.
Exercise: What’s the output of this code snippet?
π Recommended Tutorial: Why Use NumPy Instead of List Operations?
Where to Go From Here?
Do you love NumPy? And do you want to build your Python specialization in the area of data science?
Then check out my book “Coffee Break NumPy”. The ebook comes with plenty of hand-crafted material, NumPy puzzles, cheat sheets, and even video tutorials that will boost your data science skill level. Your NumPy education is a critical keystone on your path to data science and machine learning mastery.