Finding prime numbers is of critical importance for practical applications such as cryptography. Many public-key methods are only safe from a cryptographic point of view because it’s generally inefficient and slow to compute the prime factors of large numbers.
As you go over the article, feel free to watch my explainer video on the Sieve of Eratosthenes:
Problem Formulation
A prime number n
is an integer number that is not divisible without a remainder by any other (integer) number apart from 1
and n
. In other words, there are no two integers a
and b
such that their product equals the prime number: a * b = n
.
Say you want to check for a certain number n
whether it is a prime number. How do you accomplish this?
Let’s start with a naïve algorithm to determine prime numbers:
Naive Prime Checker Algorithm in Python
The following algorithm checks for all numbers between 2
and n
whether this number is a divisor of the number n
using the modulo operation:
def prime(n): for i in range(2,n): if n % i == 0: return False return True print(prime(10)) # False print(prime(11)) # True print(prime(7919)) # True
For example, if you want to determine whether number n=10
is a prime number, the algorithm will quickly realize that for i=2
, the result of the modulo expression n % i == 0
is True
. If so, it has found a number i
that is a divisor of n
, so n
cannot be a prime number. Therefore, the algorithm leaves the function and returns False
.
💡 For a detailed recap on the modulo operation, check out my blog tutorial or watch the following video:
The naive prime checker algorithm tests for a single number n
whether it is prime. The time complexity is linear in the input n
: the algorithm needs n
loop iterations (worst-case) to check whether number n
is a prime number.
But what if you want to calculate all prime numbers from 2
to a certain maximal number m
? Simple, you repeat above prime test m-1
times:
# Find all prime numbers <m m = 20 primes = [n for n in range(2,m) if prime(n)] print(primes) # [2, 3, 5, 7, 11, 13, 17, 19]
We use list comprehension to create a list with all prime numbers smaller than m
.
Time complexity considerations: Because of the for loop, this algorithm requires m-1
function calls of is_prime(n)
. So, the time complexity is bounded by (m-1) * n < m**2
. In other words, to find all prime numbers smaller than m = 100
takes up to m**2 = 10000
operations! The number of operations grows quadratically with the input m
.
Is there a better way?
The Sieve of Eratosthenes in Python
Recap problem: Write an algorithm that is more efficient than the above naïve implementation to find all prime numbers up to a maximal integer number m
.
This one-liner is inspired by an ancient algorithm called “the Sieve of Eratosthenes” which will be explained in the remainder of this section.
Note, this one-liner may look scary to you — later in this article, I’ll also give a full code for the Sieve of Eratosthenes. If you need to polish up your one-liner skills, check out my best-selling book Python One-Liners.
## Dependencies from functools import reduce ## The Data n=100 ## The One-Liner primes = reduce(lambda r, x: r - set(range(x**2, n, x)) if x in r else r, range(2, int(n**0.5) + 1), set(range(2,n))) ## The Result print(primes) # {2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97}
Listing: One-liner solution implementing the Sieve of Eratosthenes.
If you are not totally confused by this one-liner, your name is probably Guido Van Rossum (the creator of Python). The rest of us may need some background knowledge to be able to understand what happens here.
To be frank, this one-liner is confusing, complex, and unreadable. Still, this is the type of code you are facing in practice, and with this my one-liners, I want to ensure that you are able to understand every single line of code – even if it takes some time. Need proof that people write code like this in practice? I stumbled upon this one-liner at StackOverflow. It is loosely based on an algorithm called The Sieve of Eratosthenes. The Sieve of Eratosthenes is an ancient and still very popular algorithm to calculate prime numbers.
Algorithm Idea
Before we dive into the code, let’s try to grasp the idea of the algorithm first. The algorithm is extremely simple: it creates (conceptually) a huge array of numbers from 2
to m
, the maximal integer number. Then, it repeatedly marks numbers in this array that are not prime numbers. After the algorithm terminates, all unmarked numbers are prime numbers.
To achieve this objective, the algorithm repeats the following steps:
- Start with the first number
2
and increment it in every step of the process until you find an unmarked numberx
that is prime. - Mark all multiples of number
x
because they are not prime: numberx
is a divisor of all those numbers. - Simple Optimization: Start marking the multiples from number
x*x
instead of2x
. The reason is that all numbers between2x
andx*x
are already marked (see below).
Visual Algorithm Explained
Here’s a visual step-by-step example of the algorithm:
Figure: Initially all numbers between 2 and m=100 are unmarked (white cells). The first unmarked number 2 is a prime number.
Figure: Mark all multiples of 2 because they are not prime. Ignore the marked numbers for the rest of the algorithm.
Figure: Go to the next unmarked number 3. Because it is unmarked at this point, it is a prime number. Then, mark all multiples of 3. Start marking from number 3*3 because all multiples of 3 between 3 and 3*3=9 are already marked.
Figure: Go to the next unmarked number 5 (which is a prime number). Then, mark all multiples of 5. Start marking from number 5*5 because all multiples of 5 between 5 and 5*5=25 are already marked.
Figure: Go to the next unmarked number 7 (which is a prime number). Then, mark all multiples of 7. Start marking from number 7*7 because all multiples of 7 between 7 and 7*7=49 are already marked.
Figure: Go to the next unmarked number 11 (which is a prime number). Then, mark all multiples of 11. As we would start marking from number 11*11=121, we realize that this is already larger than our maximal number m=100. Hence, the algorithm has terminated. All remaining unmarked numbers are not divisible by any number and are, therefore, prime numbers.
Putting It All Together
This algorithm is much more efficient than the naïve algorithm to compute all primes up to a certain number m
. Why? Because the naïve algorithm checks for each number independently whether it is a prime number – ignoring all previous computations. In contrast to that, the Sieve of Eratosthenes reuses results from previous computational steps – a common idea in many areas of algorithmic optimization. Each time, we cross out multiples of a prime number, we essentially skip the tedious work of checking whether this multiple is a prime number or not: we already know that it isn’t.
A good question is why we start marking from the squared prime number instead of the prime number itself. For example, in the above figure where we just found prime number 7, we start marking from number 7*7 = 49. The reason is that we already marked all other multiples in previous iterations: 2*7, 3*7, 4*7, 5*7, 6*7. In other words, we have already marked all multiples of numbers that are smaller than the current prime number 7: 2, 3, 4, 5, 6.
Unveiling the One-Liner
Equipped with a thorough conceptual understanding of the algorithm, we can now start unveiling the one-liner solution:
## The One-Liner primes = reduce(lambda r, x: r - set(range(x**2, n, x)) if x in r else r, range(2, int(n**0.5) + 1), set(range(2,n)))
It’s very elegant but you need to invest some time to understand it. The reduce function takes three arguments: reduce(function, iterable, initializer)
. Here’s the relevant description from the documentation:
“Apply function of two arguments cumulatively to the items of sequence, from left to right, so as to reduce the sequence to a single value. For example, reduce(lambda x, y: x+y, [1, 2, 3, 4, 5]) calculates ((((1+2)+3)+4)+5). The left argument, x, is the accumulated value and the right argument, y, is the update value from the sequence. If the optional initializer is present, it is placed before the items of the sequence in the calculation, and serves as a default when the sequence is empty. If initializer is not given and sequence contains only one item, the first item is returned.”
The one-liner uses the reduce function to remove, one step at a time, all “marked” numbers from the initial set of all numbers between 2 and n
(in the one-liner: set(range(2, n))
). It takes this set as the initial value for the set of unmarked values r
because initially, all values are unmarked.
Now it goes over all numbers x
between 2 and the square root of n
(in the one-liner: range(2, int(n**0.5) + 1)
) and removes the multiples of x
from the set r
(starting at x**2
) – but only if the number x is a prime number (i.e., it is not removed from the set r
at this point in time).
Spend 5-15 minutes to reread this explanation and study the different parts of the one-liner carefully – I promise that after your initial confusion, you will find this exercise to be worth your invested time for that you’ve significantly progressed your Python code understanding skills.
The Original Sieve in Python (Multiple Lines)
If you’re looking for the real algorithm that’s not a one-liner, feel free to copy&paste this algorithm modified from here:
def sieve(n): # Initialize primary list: a = [True] * n a[0] = a[1] = False for (i, isprime) in enumerate(a): if isprime: yield i # Mark non-prime for j in range(i*i, n, i): a[j] = False print(list(sieve(100000)))
This uses largely the same idea of marking the non-prime numbers, as explained before.
Want to accelerate your Python skills and become a next-level coder? Becoming a Python master could easily be the most profitable decision of your career!
Python One-Liners Book: Master the Single Line First!
Python programmers will improve their computer science skills with these useful one-liners.
Python One-Liners will teach you how to read and write “one-liners”: concise statements of useful functionality packed into a single line of code. You’ll learn how to systematically unpack and understand any line of Python code, and write eloquent, powerfully compressed Python like an expert.
The book’s five chapters cover (1) tips and tricks, (2) regular expressions, (3) machine learning, (4) core data science topics, and (5) useful algorithms.
Detailed explanations of one-liners introduce key computer science concepts and boost your coding and analytical skills. You’ll learn about advanced Python features such as list comprehension, slicing, lambda functions, regular expressions, map and reduce functions, and slice assignments.
You’ll also learn how to:
- Leverage data structures to solve real-world problems, like using Boolean indexing to find cities with above-average pollution
- Use NumPy basics such as array, shape, axis, type, broadcasting, advanced indexing, slicing, sorting, searching, aggregating, and statistics
- Calculate basic statistics of multidimensional data arrays and the K-Means algorithms for unsupervised learning
- Create more advanced regular expressions using grouping and named groups, negative lookaheads, escaped characters, whitespaces, character sets (and negative characters sets), and greedy/nongreedy operators
- Understand a wide range of computer science topics, including anagrams, palindromes, supersets, permutations, factorials, prime numbers, Fibonacci numbers, obfuscation, searching, and algorithmic sorting
By the end of the book, you’ll know how to write Python at its most refined, and create concise, beautiful pieces of “Python art” in merely a single line.