NumPy vs Lists
NumPy is an important Python library used for numerical operations and data science.
Its mathematical operations on arrays are faster and more efficient than on lists. Since NumPy is primarily written in C and C++, it can perform faster operations.
NumPy arrays are homogeneous arrays that can store only one type of data for functions to process, so no type checking is necessary, and the data is stored in continuous memory locations.
Lists, however, can have multiple types of data that are pointed to rather than being stored congruently. NumPy also takes advantage of vectorization, which converts algorithms that process one value at a time, such as in a list, to operate on a set of values or a vector in a matrix.
Example
For example, NumPy uses ufuncs
or “Universal Functions” that operate on the ndarray
object, where n = number
and d = dimension
.
When multiplying two lists in Python, an iterative statement is used.
a = [1, 2, 3, 4] b = [5, 6, 7, 8] ab = [] for i in range(0, len(a)): ab.append(a[i] * b[i])
When using NumPy, the iterative statement is converted into a vector-based operation.
a = [1, 2, 3, 4] b = [5, 6, 7, 8] np.multiply(a, b)
Single Instruction Multiple Data (SIMD)
Modern CPUs support vectorization with SIMD (Single Instruction Multiple Data).
NumPy’s functions operate on matrices by using grid-distributed calculations and taking advantage of the multicore architecture and increased width of SIMD.
This parallel software design allows for improved performance in processing large amounts of data and has applications in many fields, such as science and finance.
Memory Comparison
NumPy may use much less memory to store data than regular Python lists.
The following example shows the amount of memory used by Python lists and NumPy arrays. We create an array of 10 elements of type Integer.
import numpy as np import sys py_arr = [1,2,3,4,5,6,7,8,9,10] numpy_arr = np.array([1,2,3,4,5,6,7,8,9,10]) # What is the size of a Python List (in Bytes)? sizeof_py_arr = sys.getsizeof(1) * len(py_arr) print(sizeof_py_arr) # 280 # What is the size of a NumPy Array (in Bytes)? sizeof_numpy_arr = numpy_arr.itemsize * numpy_arr.size print(sizeof_numpy_arr) # 80
NumPy array uses 8 bytes to store one integer, and 10*8 = 80 bytes are used, whereas Python lists use 280 bytes to store the same number of elements.
Performance Evaluation
Python lists are an array of pointers referring to objects in memory.
my_list = [1, 2, ['Alice', 'Bob'], 'hello world']
Each time we refer to a memory object, Python first retrieves the pointer and then goes to the memory location of the pointer to access that object—this causes a significant performance decrease in lists.
💡 However, NumPy arrays are homogeneous. They store only one type of data in continuous memory locations by which the access time of an object takes very little or no time.
Advantages of using NumPy compared to regular Python lists
NumPy arrays
- consume less memory than lists.
- are efficiently fast at performing operations.
- support some scientific functions, such as linear algebra.
- support element-wise operations on arrays
- are primarily used to perform operations on linear data quickly. That is why they are extensively used in machine learning and data analytics.
Disadvantages of using NumPy compared to regular Python lists
Python lists
- are easier to modify. Since all elements in lists are stored individually, it is easier to add and delete elements in lists. In NumPy arrays, elements are stored at continuous locations, so addition and deletion require shifting of elements which is time-consuming.
- can grow dynamically, However, NumPy arrays are fixed-size,
- are built-in functions. NumPy arrays are not built-in; We have to import an external library to use them.
Summary
In summary, the functionality of NumPy compared to regular Python Lists is as follows:
NumPy | Regular Python Lists |
Numerical data only | Heterogenous amount and types of data |
Allows multi-dimensional slicing | Allows horizontal slicing |
Broadcasting possible for operations on different size of arrays | Iteration possible for operations on 2 same-size lists |
Allows slice assignment easily | Slice assignment may be more complicated |
Many convenient methods, e.g. summing over axes | Many built-in functions and list methods |
Better processing speed | Less efficient for processing |
Less memory may be used | More memory may be used |
Mutable | Mutable |
Fixed-size | Can increase in size |
Related Video
References
Here are some materials used as preparation of this article:
- Mayer, C., Riaz, Z., & Rieger, L. (2018). Coffee Break NumPy: A Simple Road to Data Science Mastery That Fits Into Your Busy Life.
- https://numpy.org/doc/stable/user/whatisnumpy.html
- https://www.w3schools.com/python/numpy/numpy_ufunc.asp
- https://www.intel.com/content/www/us/en/developer/articles/technical/vectorization-a-key-tool-to-improve-performance-on-modern-cpus.html