List Difference | The Most Pythonic Way

Rate this post

Short answer: The most Pythonic way to compute the difference between two lists l1 and l2 is the list comprehension statement [x for x in l1 if x not in set(l2)]. This works even if you have duplicate list entries, it maintains the original list ordering, and it’s efficient due to the constant runtime complexity of the set membership operation.

What’s the best way to compute the difference between two lists in Python?

a = [5, 4, 3, 2, 1]
b = [4, 5, 6, 7]

# a - b == [3, 2, 1]
# b - a == [6, 7]

In Python, you always have multiple ways to solve the same (or a similar) problem. Let’s have an overview in the following interactive code shell:

Exercise: Run the code and think about your preferred way!

Let’s dive into each of the methods to find the most Pythonic one for your particular scenario.

Method 1: Set Difference

The naive approach to solve this problem is to convert both lists into sets and use the set minus (or set difference) operation.

# Method 1: Set Difference
print(set(a) - set(b))
# {1, 2, 3}
print(set(b) - set(a))
# {6, 7}

This approach is elegant because it’s readable, efficient, and concise.

However, there are some unique properties to this method which you should be aware of:

  • The result is a set and not a list. You can convert it back to a list by using the list(...) constructor.
  • All duplicated list entries are removed in the process because sets cannot have duplicated elements.
  • The order of the original list is lost because sets do not maintain the ordering of the elements.

If all three properties are acceptable to you, this is by far the most efficient approach as evaluated later in this article!

However, how can you maintain the order of the original list elements while also allow duplicates? Let’s dive into the list comprehension alternative!

Method 2: List Comprehension

List comprehension is a compact way of creating lists. The simple formula is [expression + context].

  • Expression: What to do with each list element?
  • Context: What elements to select? The context consists of an arbitrary number of for and if statements.

You can use list comprehension to go over all elements in the first list but ignore them if they are in the second list:

# Method 2: List Comprehension
print([x for x in a if x not in set(b)])
# [3, 2, 1]

We used a small but effective optimization of converting the second list b to a set first. The reason is that checking membership x in b is much faster for sets than for lists. However, semantically, both variants are identical.

Here are the distinctive properties of this approach:

  • The result of the list comprehension statement is a list.
  • The order of the original list is maintained.
  • Duplicate elements are maintained.

If you rely on these more powerful guarantees, use the list comprehension approach because it’s the most Pythonic one.

Method 3: Simple For Loop

Surprisingly, some online tutorials recommend using a nested for loop (e.g., those guys):

# Method 3: Nested For Loop
d = []
for x in a:
    if x not in b:
# [3, 2, 1]

In my opinion, this approach would only be used by absolute beginners or coders who come from other programming languages such as C++ or Java and don’t know essential Python features like list comprehension. You can optimize this method by converting the list b to a set first to accelerate the check if x not in b by a significant margin.

Performance Evaluation

Want to know the most performant one? In the following, I tested three different approaches:

import timeit

init = 'l1 = list(range(100)); l2 = list(range(50))'

# 1. Set Conversion
print(timeit.timeit('list(set(l1) - set(l2))', init, number = 10000))

# 2. List Comprehension
print(timeit.timeit('[x for x in l1 if x not in l2]', init, number = 10000))

# 3. List Comprehension + set
print(timeit.timeit('s = set(l2);[x for x in l1 if x not in s]', init, number = 10000))

You can run the code in our interactive Python shell:

Exercise: Run the code. Which is fastest and why?

Although the first approach seems to be fastest, you now know that it has some disadvantages, too. (Loses duplicate info, loses ordering info.) From the two list comprehension approaches, the second one kills the first one in terms of runtime complexity and performance!

Where to Go From Here?

Enough theory. Letโ€™s get some practice!

Coders get paid six figures and more because they can solve problems more effectively using machine intelligence and automation.

To become more successful in coding, solve more real problems for real people. Thatโ€™s how you polish the skills you really need in practice. After all, whatโ€™s the use of learning theory that nobody ever needs?

You build high-value coding skills by working on practical coding projects!

Do you want to stop learning with toy projects and focus on practical code projects that earn you money and solve real problems for people?

๐Ÿš€ If your answer is YES!, consider becoming a Python freelance developer! Itโ€™s the best way of approaching the task of improving your Python skillsโ€”even if you are a complete beginner.

If you just want to learn about the freelancing opportunity, feel free to watch my free webinar โ€œHow to Build Your High-Income Skill Pythonโ€ and learn how I grew my coding business online and how you can, tooโ€”from the comfort of your own home.

Join the free webinar now!