Initialize a Huge Python Dict of Size N (Easy & Quick)

5/5 - (2 votes)

โš”๏ธ Programming Challenge: Given an integer n that could be very high (e.g., n=1000000). How to initialize a Python dictionary of size n that is fast, easy, and efficient?

Next, you’ll learn the five main ways to solve this and compare their performance at the end of this article. Interestingly, the winner Method 4 is 38% faster than the slowest Method 5 and 14% faster than the next fastest.

Scroll down to see the winning method that is both easy and maximally efficient! ๐Ÿš€

Method 1: Basic For Loop

A simple and straightforward—but not super concise—way to create and initialize a dictionary of size n is to use a simple for loop to fill up an initially empty dictionary by using dictionary assignments such as d[i] = None in the loop body.

Here’s a simple code snippet using this approach:

def init_dict_1(n):
    ''' Initialize a dictionary with n key-value pairs. '''
    d = {}
    for i in range(n):
        d[i] = None
    return d

The output:

print(init_dict_1(100))

{0: None, 1: None, 2: None, 3: None, 4: None, 5: None, 6: None, 7: None, 8: None, 9: None, 10: None, 11: None, 12: None, 13: None, 14: None, 15: None, 16: None, 17: None, 18: None, 19: None, 20: None, 21: None, 22: None, 23: None, 24: None, 25: None, 26: None, 27: None, 28: None, 29: None, 30: None, 31: None, 32: None, 33: None, 34: None, 35: None, 36: None, 37: None, 38: None, 39: None, 40: None, 41: None, 42: None, 43: None, 44: None, 45: None, 46: None, 47: None, 48: None, 49: None, 50: None, 51: None, 52: None, 53: None, 54: None, 55: None, 56: None, 57: None, 58: None, 59: None, 60: None, 61: None, 62: None, 63: None, 64: None, 65: None, 66: None, 67: None, 68: None, 69: None, 70: None, 71: None, 72: None, 73: None, 74: None, 75: None, 76: None, 77: None, 78: None, 79: None, 80: None, 81: None, 82: None, 83: None, 84: None, 85: None, 86: None, 87: None, 88: None, 89: None, 90: None, 91: None, 92: None, 93: None, 94: None, 95: None, 96: None, 97: None, 98: None, 99: None}

๐Ÿ‘‰ Recommended Tutorial: Adding Elements to a Python Dictionary

Method 2: Dictionary Comprehension

Dictionary Comprehension is a concise and memory-efficient way to create and initialize dictionaries in one line of Python code. It consists of two parts: expression and context.

  • The expression defines how to map keys to values.
  • The context loops over an iterable using a single-line for loop and defines which key:value pairs to include in the new dictionary.

Here’s how you can use dictionary comprehension to create and initialize a dictionary of size n:

def init_dict_2(n):
    ''' Initialize a dictionary with n key-value pairs. '''
    return {i:None for i in range(n)}

The output:

print(init_dict_2(100))

{0: None, 1: None, 2: None, 3: None, 4: None, 5: None, 6: None, 7: None, 8: None, 9: None, 10: None, 11: None, 12: None, 13: None, 14: None, 15: None, 16: None, 17: None, 18: None, 19: None, 20: None, 21: None, 22: None, 23: None, 24: None, 25: None, 26: None, 27: None, 28: None, 29: None, 30: None, 31: None, 32: None, 33: None, 34: None, 35: None, 36: None, 37: None, 38: None, 39: None, 40: None, 41: None, 42: None, 43: None, 44: None, 45: None, 46: None, 47: None, 48: None, 49: None, 50: None, 51: None, 52: None, 53: None, 54: None, 55: None, 56: None, 57: None, 58: None, 59: None, 60: None, 61: None, 62: None, 63: None, 64: None, 65: None, 66: None, 67: None, 68: None, 69: None, 70: None, 71: None, 72: None, 73: None, 74: None, 75: None, 76: None, 77: None, 78: None, 79: None, 80: None, 81: None, 82: None, 83: None, 84: None, 85: None, 86: None, 87: None, 88: None, 89: None, 90: None, 91: None, 92: None, 93: None, 94: None, 95: None, 96: None, 97: None, 98: None, 99: None}

๐Ÿ‘‰ Recommended Tutorial: Python Dictionary Comprehension: A Powerful One-Liner Tutorial

Python Dictionary Comprehension - A Powerful One-Liner Tutorial

Method 3: zip() and range()

The zip() function takes an arbitrary number of iterables and aggregates them to a single iterable, a zip object. It combines the i-th values of each iterable argument into a tuple. Hence, if you pass two iterables, each tuple will contain two values.

You can use the zip() function to construct a dictionary by first using it to create an iterable of tuples using zip(range(n), [None] * n) where the first tuple values are the keys and the second tuple values are None. Then pass the result into the dict() function to create a dictionary out of it.

Here’s how this one-liner solution looks like:

def init_dict_3(n):
    ''' Initialize a dictionary with n key-value pairs. '''
    return dict(zip(range(n), [None] * n))

The output:

print(init_dict_3(100))

{0: None, 1: None, 2: None, 3: None, 4: None, 5: None, 6: None, 7: None, 8: None, 9: None, 10: None, 11: None, 12: None, 13: None, 14: None, 15: None, 16: None, 17: None, 18: None, 19: None, 20: None, 21: None, 22: None, 23: None, 24: None, 25: None, 26: None, 27: None, 28: None, 29: None, 30: None, 31: None, 32: None, 33: None, 34: None, 35: None, 36: None, 37: None, 38: None, 39: None, 40: None, 41: None, 42: None, 43: None, 44: None, 45: None, 46: None, 47: None, 48: None, 49: None, 50: None, 51: None, 52: None, 53: None, 54: None, 55: None, 56: None, 57: None, 58: None, 59: None, 60: None, 61: None, 62: None, 63: None, 64: None, 65: None, 66: None, 67: None, 68: None, 69: None, 70: None, 71: None, 72: None, 73: None, 74: None, 75: None, 76: None, 77: None, 78: None, 79: None, 80: None, 81: None, 82: None, 83: None, 84: None, 85: None, 86: None, 87: None, 88: None, 89: None, 90: None, 91: None, 92: None, 93: None, 94: None, 95: None, 96: None, 97: None, 98: None, 99: None}

๐Ÿ‘‰ Recommended Tutorial: Understanding the zip() function in Python

Method 4: dict.fromkeys()

The method dict.fromkeys() is a very useful method for creating new dictionaries from a given iterable of keys. It inputs keys and maybe an optional value, and outputs a dictionary with the specified keys, that are mapped either to optionally specified values or to the default None value.

def init_dict_4(n):
    ''' Initialize a dictionary with n key-value pairs. '''
    return dict.fromkeys(range(n))

The output:

print(init_dict_4(100))

{0: None, 1: None, 2: None, 3: None, 4: None, 5: None, 6: None, 7: None, 8: None, 9: None, 10: None, 11: None, 12: None, 13: None, 14: None, 15: None, 16: None, 17: None, 18: None, 19: None, 20: None, 21: None, 22: None, 23: None, 24: None, 25: None, 26: None, 27: None, 28: None, 29: None, 30: None, 31: None, 32: None, 33: None, 34: None, 35: None, 36: None, 37: None, 38: None, 39: None, 40: None, 41: None, 42: None, 43: None, 44: None, 45: None, 46: None, 47: None, 48: None, 49: None, 50: None, 51: None, 52: None, 53: None, 54: None, 55: None, 56: None, 57: None, 58: None, 59: None, 60: None, 61: None, 62: None, 63: None, 64: None, 65: None, 66: None, 67: None, 68: None, 69: None, 70: None, 71: None, 72: None, 73: None, 74: None, 75: None, 76: None, 77: None, 78: None, 79: None, 80: None, 81: None, 82: None, 83: None, 84: None, 85: None, 86: None, 87: None, 88: None, 89: None, 90: None, 91: None, 92: None, 93: None, 94: None, 95: None, 96: None, 97: None, 98: None, 99: None}

๐Ÿ‘‰ Recommended Tutorial: Python’s Dictionary fromkeys() Method

Method 5: Simple While Loop

You can also use a simple while loop to create a large dictionary one mapping at a time. The difference between a while and a for loop (see Method 1) is that you don’t rely on the range() function this way which may take some time. The while loop approach works with simple integer addition operations.

Here’s the solution using while:

def init_dict_5(n):
    ''' Initialize a dictionary with n key-value pairs. '''
    d = dict()
    i = 0
    while i<n:
        d[i] = None
        i += 1
    return d

The output:

print(init_dict_5(100))

{0: None, 1: None, 2: None, 3: None, 4: None, 5: None, 6: None, 7: None, 8: None, 9: None, 10: None, 11: None, 12: None, 13: None, 14: None, 15: None, 16: None, 17: None, 18: None, 19: None, 20: None, 21: None, 22: None, 23: None, 24: None, 25: None, 26: None, 27: None, 28: None, 29: None, 30: None, 31: None, 32: None, 33: None, 34: None, 35: None, 36: None, 37: None, 38: None, 39: None, 40: None, 41: None, 42: None, 43: None, 44: None, 45: None, 46: None, 47: None, 48: None, 49: None, 50: None, 51: None, 52: None, 53: None, 54: None, 55: None, 56: None, 57: None, 58: None, 59: None, 60: None, 61: None, 62: None, 63: None, 64: None, 65: None, 66: None, 67: None, 68: None, 69: None, 70: None, 71: None, 72: None, 73: None, 74: None, 75: None, 76: None, 77: None, 78: None, 79: None, 80: None, 81: None, 82: None, 83: None, 84: None, 85: None, 86: None, 87: None, 88: None, 89: None, 90: None, 91: None, 92: None, 93: None, 94: None, 95: None, 96: None, 97: None, 98: None, 99: None}

๐Ÿ‘‰ Recommended Tutorial: Understanding Python Loops from the Ground Up

Performance Evaluation

We used an Intel Core i7 with 1.8GHz TurboBost up to 4.6 GHz with 8GB DDR4 Memory and 512GB storage (not that it mattered) to compare each of the five methods on various values of n—using an exponentially increasing function as shown in the code below.

This allowed us to stress-test the dictionary creation functions discussed in this article on large inputs to generate dictionaries with up to 100 million (!) entries. ๐Ÿ˜ฎ

โšก Experiment Results: The output shows that Method 4 is the fastest and scaled best, followed by Method 2, Method 3, Method 1, and finally Method 5 (the slowest). The winner Method 4 is 38% faster than the slowest Method 5 and 14% faster than the next fastest.

  • Method 1 needed 0.69 seconds for 100 million dict entries.
  • Method 2 needed 0.67 seconds for 100 million dict entries.
  • Method 3 needed 0.69 seconds for 100 million dict entries.
  • Method 4 needed 0.58 seconds for 100 million dict entries.
  • Method 5 needed 0.93 seconds for 100 million dict entries.

We used the following code to generate this graphic:

import time
import matplotlib.pyplot as plt


def init_dict_1(n):
    ''' Initialize a dictionary with n key-value pairs. '''
    d = {}
    for i in range(n):
        d[i] = None
    return d


def init_dict_2(n):
    ''' Initialize a dictionary with n key-value pairs. '''
    return {i:None for i in range(n)}


def init_dict_3(n):
    ''' Initialize a dictionary with n key-value pairs. '''
    return dict(zip(range(n), [None] * n))


def init_dict_4(n):
    ''' Initialize a dictionary with n key-value pairs. '''
    return dict.fromkeys(range(n))


def init_dict_5(n):
    ''' Initialize a dictionary with n key-value pairs. '''
    d = dict()
    i = 0
    while i<n:
        d[i] = None
        i += 1
    return d


# Performance Evaluation
xs = []
y_1, y_2, y_3, y_4, y_5 = [], [], [], [], []
for x in [10**i for i in range(3, 8)]:
    xs.append(x)

    # Method 1 Elapsed Runtime:
    start = time.time()
    init_dict_1(x)
    stop = time.time()
    y_1.append(stop - start)
    
    # Method 2 Elapsed Runtime:
    start = time.time()
    init_dict_2(x)
    stop = time.time()
    y_2.append(stop - start)

    # Method 3 Elapsed Runtime:
    start = time.time()
    init_dict_3(x)
    stop = time.time()
    y_3.append(stop - start)

    # Method 4 Elapsed Runtime:
    start = time.time()
    init_dict_4(x)
    stop = time.time()
    y_4.append(stop - start)

    # Method 5 Elapsed Runtime:
    start = time.time()
    init_dict_5(x)
    stop = time.time()
    y_5.append(stop - start)


print(y_1)
print(y_2)
print(y_3)
print(y_4)
print(y_5)

plt.plot(xs, y_1, '.-', label='Method 1')
plt.plot(xs, y_2, 'o-', label='Method 2')
plt.plot(xs, y_3, 'x-', label='Method 3')
plt.plot(xs, y_4, 'v--', label='Method 4')
plt.plot(xs, y_5, '.--', label='Method 5')

plt.xscale('log')
plt.legend()
plt.grid()
plt.show()

In case you need the exact values in seconds, here’s the output, one line per Method:

[0.0, 0.0, 0.008193492889404297, 0.07302451133728027, 0.6917409896850586]
[0.0, 0.0009975433349609375, 0.006968975067138672, 0.07086825370788574, 0.6777770519256592]
[0.0, 0.0, 0.008328437805175781, 0.07159566879272461, 0.6925091743469238]
[0.000997304916381836, 0.0, 0.006980180740356445, 0.06289315223693848, 0.5841073989868164]
[0.0, 0.0009970664978027344, 0.009291648864746094, 0.09641242027282715, 0.9321954250335693]

You can see that for the largest n=100000000, we obtain a runtime of 0.93 seconds for Method 5 and only 0.58s for Method 4.

Summary

The easiest and fastest way to create and initialize a dictionary with n elements is dict.fromkeys(range(n)) that maps each integer i to the default value None. If you need another default value (such as 42), just pass it as a second argument into the function.