fbpx

What are Advantages of NumPy Arrays over Regular Python Lists?

The Python built-in list data type is powerful. However, the NumPy array has many advantages over Python lists. What are they?

1. More powerful slicing functionality.

In contrast to regular slicing, NumPy slicing is a bit more powerful. Here’s how NumPy handles an assignment of a value to an extended slice.

import numpy as np


l = list(range(10))
l[::2] = 999
# Throws error --> assign iterable to extended slice


a = np.arange(10)
a[::2] = 999
print(a)
# [999   1 999   3 999   5 999   7 999   9]

Regular Python’s slicing method is not able to implement the user’s intention as numpy. In both cases, it is clear that the user wants to assign 999 to every other element in the slice. Numpy has no problems implementing this goal.

On top of that, NumPy can perform multi-dimensional slicing which is not convenient in Python.

import numpy as np


a = np.arange(16)
a = a.reshape((4,4))
print(a)
# [ 0  1  2  3]
# [ 4  5  6  7]
# [ 8  9 10 11]
# [12 13 14 15]]

print(a[:, 1])
# Second column:
# [ 1  5  9 13]

print(a[1, :])
# Second row:
# [4 5 6 7]

If you want to master the fine but powerful features of NumPy and become a data science pro, check out my book “Coffee Break NumPy”.

2. More efficient data representation.

NumPy arrays are much faster to access and create while having a smaller memory footprint. Need more proof?

import numpy as np
import sys

x = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
y = np.array(x)

print(sys.getsizeof(x))
# 144 bytes

print(sys.getsizeof(y))
# 136 bytes

The reduced memory footprint of a NumPy array becomes even more pronounced for larger data sets.

Check out this great resource where you can check the speed of NumPy arrays vs Python lists.

3. More convenient.

This excellent StackOverflow answer provides a great example of how NumPy arrays are much more convenient in practice:

Read your data from a file and convert it to a three-dimensional cube:

x = numpy.fromfile(file=open("data"), dtype=float).reshape((10, 10, 10))

Find cells that are greater than a certain threshold 0.1:

(x > 0.1).nonzero()

Sum along the first dimension:

x.sum(axis=0)

All of those capabilities do simply not exist in Python lists. There’s no way of creating a multi-dimensional Python list in such a concise manner!

Leave a Comment

Your email address will not be published. Required fields are marked *