How to Create a Dictionary From Two NumPy Arrays?

Rate this post

Anyone working with lists of data will encounter a need to combine them in a useful way. Often the best result is a dictionary consisting of keys and values. In this article, you’ll learn how to create a dictionary from two NumPy arrays.

Problem Formulation: Given two NumPy arrays a and b. Create a dictionary that assigns key a[i] to value b[i] for all i.

Example: Given two NumPy arrays

a = np.array([1, 42, 0])
b = np.array(['Alice', 'Bob', 'Liz'])

Create a new dictionary programmatically that assigns the elements in a to the elements in b, element-wise:

{1: 'Alice',
 42: 'Bob',
 0: 'Liz'}

After providing you some background for the input NumPy array, you’ll learn multiple methods to accomplish this.

Background: NumPy for the Array

NumPy is a Python library useful for working with arrays. NumPy stands for ‘Numerical Python’. Python users can use standard lists as arrays, but NumPy works faster because the array items are stored in contiguous memory. This makes it more efficient to, for example, iterate through the array rather than having to scramble across the memory space to find the next item.

If we have Python and PIP already installed on our systems, then the installation of NumPy is easy:

Creating a NumPy array is as simple as importing the NumPy library and calling the array() function. NumPy is often imported under the np alias:

import numpy as np	
planet = np.array(['Mercury', 'Venus', 'Earth', 'Mars'])
orbitalPeriod = np.array([88.0, 224.7, 365.2, 687.0])

Unlike Python’s standard lists, which can hold different data types in a single list, NumPy’s arrays should be homogeneous, all the same data type. Otherwise we lose the mathematical efficiency built into a NumPy array.

Method 1: Zip Them Up

Having created two arrays, we can then use Python’s zip() function to merge them into a dictionary. The zip() module is in Python’s built-in namespace. If we use dir() to view __builtins__ we find zip() at the end of the list:

['ArithmeticError', 'AssertionError'...,'vars', 'zip']

The zip() function makes an iterator that merges items from each of the iterable arrays, just like the interlocking teeth of a zipper on a pair of jeans. In fact, the zip() function was named for a physical zipper.

d = {}
for A, B in zip(planet, orbitalPeriod):
    d[A] = B

# {'Mercury': 88.0, 'Venus': 224.7, 'Earth': 365.2, 'Mars': 687.0}

When using the zip() function, we are guaranteed that the elements will stay in the given left-to-right order. No need to worry that the elements in the arrays will be mixed as they are combined into the dictionary. Otherwise the dictionary would be useless, as the keys would not align properly with their values.

Method 2: Arrays of Unequal Lengths

In some cases, our arrays may be of unequal lengths, meaning that one array has more elements than the other. If so, then using the zip() function to merge them will result in the dictionary matching the shortest array’s length.Β  Here’s an example of the brightest stars in the Pleiades cluster with their apparent magnitudes:

stars = np.array(['Alcyone', 'Atlas', 'Electra',
                  'Maia', 'Merope', 'Taygeta', 'Pleione'])
magnitude = np.array([2.86, 3.62, 3.70, 3.86, 4.17, 4.29])
cluster = {}

for A, B in zip(stars, magnitude):
    cluster[A] = B
# {'Alcyone': 2.86, 'Atlas': 3.62, 'Electra': 3.7, 'Maia': 3.86, 'Merope': 4.17, 'Taygeta': 4.29}

As we can see, the ‘stars‘ array contained the Seven Sisters, the seven brightest stars in the Pleiades cluster. The ‘magnitude‘ array, however, only listed the top six values for apparent magnitude. When the zip() function merged the two arrays, the seventh star was dropped entirely.

Depending on our needs, this may be acceptable. But if not, then we can use the zip_longest() function from the itertools module instead of the zip() function. With this function, any missing values will be replaced with the fillvalue argument. We can insert any value we want, and the default value will be None.

Let’s create the cluster dictionary again:

from itertools import zip_longest

cluster = {}

for A, B in zip_longest(stars, magnitude, fillvalue='?'):
    cluster[A] = B

# {'Alcyone': 2.86, 'Atlas': 3.62, 'Electra': 3.7, 'Maia': 3.86, 'Merope': 4.17, 'Taygeta': 4.29, 'Pleione': '?'}

This time all Seven Sisters are listed, and the last unknown magnitude value is marked with a question mark, perhaps to be filled in later.

By combining NumPy’s memory-efficient arrays with the zip() or zip_longest() functions’ ease of use as an iterator, we can quickly and simply create dictionaries from two arrays with a minimum of fuss.