The Sieve of Eratosthenes in One Line of Python

Finding prime numbers is of critical importance for practical applications such as cryptography. Many public-key methods are only safe (from a cryptographic point of view) because it’s generally inefficient and slow to compute the prime factors of large numbers.

A Simple Algorithm to Check Whether a Number is Prime

A prime number n is an integer number that is not divisible without a remainder by any other (integer) number apart from 1 and n. In other words, there are no two integers a and b such that their product equals the prime number: a * b = n.

Say you want to check for a certain number n whether it is a prime number. How do you accomplish this? Let’s start with a naïve algorithm to determine prime numbers:

def prime(n):
    for i in range(2,n):
        if n % i == 0:
            return False
    return True


print(prime(10))
# False

print(prime(11))
# True

print(prime(7919))
# True

The algorithm checks for all numbers between 2 and n whether this number is a divisor of the number n. For example, if you want to determine whether number n=10 is a prime number, the algorithm will quickly realize that for i=2, the result of the expression n % i == 0 is True. It has found a number i that is a divisor of n, so n cannot be a prime number. Therefore, the algorithm leaves the function and returns False.

Above algorithm tests for a single number n whether it is prime. The time complexity is linear in the input n: the algorithm needs n loop iterations (worst-case) to check whether number n is a prime number. But what if you want to calculate all prime numbers from 2 to a certain maximal number m? Simple, you repeat above prime test m-1 times:

# Find all prime numbers <m
m = 20
primes = [n for n in range(2,m) if prime(n)]

print(primes)
# [2, 3, 5, 7, 11, 13, 17, 19]

We use list comprehension (see Chapter 3) to create a list with all prime numbers smaller than m. Because of the for loop, this algorithm requires m-1 function calls of is_prime(n). So, time complexity is bounded by (m-1) * n < m**2. In other words, to find all prime numbers smaller than m = 100 takes up to m**2 = 10000 operations! The number of operations grows quadratically with the input m.

Is there another way?

The Sieve of Eratosthenes in One Line of Python Code

Write an algorithm that is more efficient than above naïve implementation to find all prime numbers up to a maximal integer number m. This one-liner is inspired by an ancient algorithm called “the Sieve of Eratosthenes” which will be explained in the remainder of this section.

## Dependencies
from functools import reduce

## The Data
n=100

## The One-Liner
primes = reduce(lambda r, x: r - set(range(x**2, n, x)) if x in r else r, range(2, int(n**0.5) + 1), set(range(2,n)))

## The Result
print(primes)
# {2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97}

Listing: One-liner solution implementing the Sieve of Eratosthenes.

If you are not totally confused by this one-liner, your name is probably Guido Van Rossum (the creator of Python). The rest of us may need some background knowledge to be able to understand what happens here.

How It Works

To be frank, this one-liner is confusing, complex, and unreadable. Still, this is the type of code you are facing in practice and with this book, I want to ensure that you are able to understand every single line of code – even if it takes some time. Need proof that people write code like this in practice? I stumbled upon this one-liner at StackOverflow. It is loosely based on an ancient algorithm called The Sieve of Eratosthenes. The Sieve of Eratosthenes is an ancient and still very popular algorithm to calculate prime numbers.

Before we dive into the code, let’s try to grasp the idea of the algorithm first. The algorithm is extremely simple: it creates (conceptually) a huge array of numbers from 2 to m, the maximal integer number. Then, it repeatedly marks numbers in this array that are not prime numbers. After the algorithm terminates, all unmarked numbers are prime numbers.

To achieve this objective, the algorithm repeats the following steps:

  • Start with the first number 2 and increment it in every step of the process until you find an unmarked number x that is prime.
  • Mark all multiples of number x because they are not prime: number x is a divisor of all those numbers.
  • Simple Optimization: Start marking the multiples from number x*x instead of 2x. The reason is that all numbers between 2x and x*x are already marked (see below).

Visual Algorithm Explained

Here’s a visual step-by-step example of the algorithm:

Figure: Initially all numbers between 2 and m=100 are unmarked (white cells). The first unmarked number 2 is a prime number.

Figure: Mark all multiples of 2 because they are not prime. Ignore the marked numbers for the rest of the algorithm.

Figure: Go to the next unmarked number 3. Because it is unmarked at this point, it is a prime number. Then, mark all multiples of 3. Start marking from number 3*3 because all multiples of 3 between 3 and 3*3=9 are already marked.

Figure: Go to the next unmarked number 5 (which is a prime number). Then, mark all multiples of 5. Start marking from number 5*5 because all multiples of 5 between 5 and 5*5=25 are already marked.

Figure: Go to the next unmarked number 7 (which is a prime number). Then, mark all multiples of 7. Start marking from number 7*7 because all multiples of 7 between 7 and 7*7=49 are already marked.

Figure: Go to the next unmarked number 11 (which is a prime number). Then, mark all multiples of 11. As we would start marking from number 11*11=121, we realize that this is already larger than our maximal number m=100. Hence, the algorithm has terminated. All remaining unmarked numbers are not divisible by any number and are, therefore, prime numbers.

Putting It All Together

This algorithm is much more efficient than the naïve algorithm to compute all primes up to a certain number m. Why? Because the naïve algorithm checks for each number independently whether it is a prime number – ignoring all previous computations. In contrast to that, the Sieve of Eratosthenes reuses results from previous computational steps – a common idea in many areas of algorithmic optimization. Each time, we cross out multiples of a prime number, we essentially save us from the tedious work of checking whether this multiple is a prime number or not: we already know that it isn’t.

A good question is why we start marking from the squared prime number instead of the prime number itself. For example, in the above algorithm in Figure 8-8, we just found prime number 7 and start marking from number 7*7 = 49. The reason is that we already marked all other multiples in previous iterations: 2*7, 3*7, 4*7, 5*7, 6*7. In other words, we have already marked all multiples of numbers that are smaller than the current prime number 7: 2, 3, 4, 5, 6.

Equipped with a thorough conceptual understanding of the algorithm, we can now start unveiling the one-liner solution:

## The One-Liner
primes = reduce(lambda r, x: r - set(range(x**2, n, x)) if x in r else r, range(2, int(n**0.5) + 1), set(range(2,n)))

It’s very elegant but you need to invest some time to understand it. The reduce function takes three arguments: reduce(function, iterable, initializer). Here’s the relevant description from the documentation:

    “Apply function of two arguments cumulatively to the items of sequence, from left to right, so as to reduce the sequence to a single value. For example, reduce(lambda x, y: x+y, [1, 2, 3, 4, 5]) calculates ((((1+2)+3)+4)+5). The left argument, x, is the accumulated value and the right argument, y, is the update value from the sequence. If the optional initializer is present, it is placed before the items of the sequence in the calculation, and serves as a default when the sequence is empty. If initializer is not given and sequence contains only one item, the first item is returned.”

Documentation

The one-liner uses the reduce function to remove, one step at a time, all “marked” numbers from the initial set of all numbers between 2 and n (in the one-liner: set(range(2, n))). It takes this set as the initial value for the set of unmarked values r because initially, all values are unmarked. Now it goes over all numbers x between 2 and the square root of n (in the one-liner: range(2, int(n**0.5) + 1)) and removes the multiples of x from the set r (starting at x**2) – but only if the number x is a prime number (i.e., it is not removed from the set r at this point in time). Spend 5-15 minutes to reread this explanation and study the different parts of the one-liner carefully – I promise that after your initial confusion, you will find this exercise to be worth your invested time for that you’ve significantly progressed your Python code understanding skills.


Want to accelerate your Python skills and become a next-level coder? Becoming a Python master could easily be the most profitable decision of your career!

Leave a Comment

Your email address will not be published. Required fields are marked *