Set Comprehension Title

{Brackets} A Simple Introduction to Set Comprehension in Python

In this article, I give you everything you need to know about set comprehensions using the bracket notation {}. An example for set comprehension is the bracket notation {x for x in range(10)} to create the set {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}. Being hated by newbies, experienced Python coders can’t live without this awesome Python feature.

What is set comprehension?

Set comprehension is a concise way of creating sets. Say you want to filter out all customers from your database who earn more than $1,000,000. This is what a newbie not knowing set comprehension would do:

# (name, $-income)
customers = [("John", 240000),
            ("Alice", 120000),
            ("Ann", 1100000),
            ("Zach", 44000)]


# your high-value customers earning >$1M
whales = set()
for customer, income in customers:
   if income>1000000:
       whales.add(customer)


print(whales)
# {'Ann'}

This snippet needs four lines just to create a set of high-value customers (whales)!

If you do that in your public Python code base, be prepared to get busted for “not writing Pythonic code”. 😉

Instead, a much better way of doing the same thing is to use set comprehension:

whales = {x for x,y in customers if y>1000000}
print(whales)
# {'Ann'}

Beautiful, isn’t it?

Set comprehension is dead simple when you know the formula I will show you in a moment. So why are people confused about how to use set comprehension? Because they never looked up the most important statement on list comprehension (which is similar to set comprehension) in the Python documentation. It’s this:

“A list comprehension consists of brackets containing an expression followed by a for clause, then zero or more for or if clauses. The result will be a new list resulting from evaluating the expression in the context of the for and if clauses which follow it.”

Official Python Doc

Here is the formula for set comprehension. That’s the one thing you should take home from this article: Set comprehension consists of two parts.

‘{‘ + expression + context + ‘}’

The first part is the expression. In the example above it was the variable x. But you can also use a more complex expression such as x.upper(). Use any variable in your expression that you have defined in the context within a loop statement. See this example:

whales = {x.upper() for x,y in customers if y>1000000}
print(whales)
# {'ANN'}

The second part is the context. The context consists of an arbitrary number of for and if clauses. The single goal of the context is to define (or restrict) the sequence of elements on which we want to apply the expression. That’s why you sometimes see complex restrictions such as this:

small_fishes = {x + str(y) for x,y in customers if y<1000000 if x!='John'}
# (John is not a small fish...)
print(small_fishes)
# {'Zach44000', 'Alice120000'}

That’s about it!

Set comprehension is easy once you have invested one or two coffee breaks into your thorough understanding. Consider this done! If you wonder why I discretize time into short 5-minute slots called “coffee breaks”, read my book “Coffee Break Python” ;).

To sum up, remember this one formula from this article: set comprehension = ‘{‘ + expression + context + ‘}’.

How does nested set comprehension work in Python?

After publishing the first version of this article, many readers asked me to write a follow-up article of nested set comprehension in Python.

Coming from a computer science background, I was assuming that “nested set comprehension” refers to the creation of a set of sets. But after a bit of research, I learned that you can not build a set of sets because sets are not hashable. Of course. How stupid I was!

Instead, most coders mean something different when asking “how does nested set comprehension work?”. They want to know how to use a nested for loop to create a simple set of hashable items.

To be frank, this is super-simple stuff. Do you remember the formula of set comprehension (= ‘{‘ + expression + context + ‘}’)?

The context is an arbitrary complex restriction construct of for loops and if restrictions with the goal of specifying the data items on which the expression should be applied.

In the expression, you can use any variable you define within a for loop in the context. Let’s have a look at an example.

Suppose you want to use set comprehension to make this code more concise (for example, you want to find all possible pairs of users in your social network application):

# BEFORE
users = ["John", "Alice", "Ann", "Zach"]
pairs = set()
for x in users:
   for y in users:
       if x != y:
           pairs.add((x,y))
print(pairs)
# {('Zach', 'Alice'), ('John', 'Ann'), ('Alice', 'Zach'), ('Ann', 'John'), ('Alice', 'Ann'), ('Alice', 'John'), ('Zach', 'John'), ('Zach', 'Ann'), ('John', 'Zach'), ('Ann', 'Zach'), ('John', 'Alice'), ('Ann', 'Alice')}

Now, this code is a mess! How can we fix it? Simply use nested set comprehension!

# AFTER
pairs = {(x,y) for x in users for y in users if x!=y}
print(pairs)
# {('Ann', 'Zach'), ('Zach', 'John'), ('Alice', 'John'), ('Ann', 'Alice'), ('Ann', 'John'), ('Alice', 'Zach'), ('Alice', 'Ann'), ('John', 'Zach'), ('Zach', 'Ann'), ('John', 'Ann'), ('Zach', 'Alice'), ('John', 'Alice')}

As you can see, we are doing exactly the same thing as with un-nested set comprehension. The only difference is to write the two for loops and the if statement in a single line within the brackets {}.

What is the difference between list comprehension and set comprehension in Python?

There are two differences between list comprehension and set comprehension.

  • Braces vs. Brackets: Do you want to generate a set? Use curly braces {}. Do you want to generate a list? Use angle brackets [].
  • The data type of the result: list comprehension generates a list and set comprehension generates a set.

But syntactically, list comprehension and set comprehension are identical.

Where to go from here?

Learning Python is hard. There are myriads of new concepts and tiny details (like set comprehension) that you have to master. To help you with this gigantic task, I have created a light-weight Python email course, where I explain one such tiny Python concept at-a-time. For you, it’s like learning on autopilot. Follow me in the rabbit hole – my subscribers love it!