Why Use NumPy Instead of List Operations?

NumPy vs Lists

NumPy is an important Python library used for numerical operations and data science.

Its mathematical operations on arrays are faster and more efficient than on lists. Since NumPy is primarily written in C and C++, it can perform faster operations.

NumPy arrays are homogeneous arrays that can store only one type of data for functions to process, so no type checking is necessary, and the data is stored in continuous memory locations.

Lists, however, can have multiple types of data that are pointed to rather than being stored congruently. NumPy also takes advantage of vectorization, which converts algorithms that process one value at a time, such as in a list, to operate on a set of values or a vector in a matrix.

Example

For example, NumPy uses ufuncs or “Universal Functions” that operate on the ndarray object, where n = number and d = dimension.

When multiplying two lists in Python, an iterative statement is used.

a = [1, 2, 3, 4]
b = [5, 6, 7, 8]

ab = []
for i in range(0, len(a)):
ab.append(a[i] * b[i])

When using NumPy, the iterative statement is converted into a vector-based operation.

a = [1, 2, 3, 4]
b = [5, 6, 7, 8]

np.multiply(a, b)

Single Instruction Multiple Data (SIMD)

Modern CPUs support vectorization with SIMD (Single Instruction Multiple Data).

NumPy’s functions operate on matrices by using grid-distributed calculations and taking advantage of the multicore architecture and increased width of SIMD.

This parallel software design allows for improved performance in processing large amounts of data and has applications in many fields, such as science and finance.

Memory Comparison

NumPy may use much less memory to store data than regular Python lists.

The following example shows the amount of memory used by Python lists and NumPy arrays. We create an array of 10 elements of type Integer. 

import numpy as np
import sys


py_arr = [1,2,3,4,5,6,7,8,9,10]
numpy_arr = np.array([1,2,3,4,5,6,7,8,9,10])


# What is the size of a Python List (in Bytes)?
sizeof_py_arr = sys.getsizeof(1) * len(py_arr)
print(sizeof_py_arr)
# 280

# What is the size of a NumPy Array (in Bytes)?
sizeof_numpy_arr = numpy_arr.itemsize * numpy_arr.size
print(sizeof_numpy_arr)
# 80

NumPy array uses 8 bytes to store one integer, and 10*8 = 80 bytes are used, whereas Python lists use 280 bytes to store the same number of elements. 

Performance Evaluation

Python lists are an array of pointers referring to objects in memory.

my_list = [1, 2, ['Alice', 'Bob'], 'hello world']

Each time we refer to a memory object, Python first retrieves the pointer and then goes to the memory location of the pointer to access that object—this causes a significant performance decrease in lists.

💡 However, NumPy arrays are homogeneous. They store only one type of data in continuous memory locations by which the access time of an object takes very little or no time.

Advantages of using NumPy compared to regular Python lists

NumPy arrays 

  • consume less memory than lists.
  • are efficiently fast at performing operations.
  • support some scientific functions, such as linear algebra. 
  • support element-wise operations on arrays
  • are primarily used to perform operations on linear data quickly. That is why they are extensively used in machine learning and data analytics.

Disadvantages of using NumPy compared to regular Python lists

Python lists 

  • are easier to modify. Since all elements in lists are stored individually, it is easier to add and delete elements in lists. In NumPy arrays, elements are stored at continuous locations, so addition and deletion require shifting of elements which is time-consuming. 
  • can grow dynamically, However, NumPy arrays are fixed-size, 
  • are built-in functions. NumPy arrays are not built-in; We have to import an external library to use them.

Summary

In summary, the functionality of NumPy compared to regular Python Lists is as follows:

NumPyRegular Python Lists
Numerical data onlyHeterogenous amount and types of data
Allows multi-dimensional slicingAllows horizontal slicing
Broadcasting possible for operations on different size of arraysIteration possible for operations on 2 same-size lists
Allows slice assignment easilySlice assignment may be more complicated
Many convenient methods, e.g. summing over axesMany built-in functions and list methods
Better processing speedLess efficient for processing
Less memory may be usedMore memory may be used
MutableMutable
Fixed-sizeCan increase in size

Related Video

References

Here are some materials used as preparation of this article:

  • Mayer, C., Riaz, Z., & Rieger, L. (2018). Coffee Break NumPy: A Simple Road to Data Science Mastery That Fits Into Your Busy Life.
  • https://numpy.org/doc/stable/user/whatisnumpy.html
  • https://www.w3schools.com/python/numpy/numpy_ufunc.asp
  • https://www.intel.com/content/www/us/en/developer/articles/technical/vectorization-a-key-tool-to-improve-performance-on-modern-cpus.html