5 Best Ways to Remove Duplicate Lists in Tuples Preserving Order in Python

March 7, 2024 by Emily Rosemary Collins

💡 Problem Formulation: When working with data structures in Python, it’s common to encounter lists of tuples where some tuples contain lists that are duplicates. The challenge is to remove these duplicate lists within tuples whilst preserving the original order. For instance, given the input [([1, 2], 'a'), ([3, 4], 'b'), ([1, 2], 'c'), ([5], 'd')], we want to achieve the output [([1, 2], 'a'), ([3, 4], 'b'), ([5], 'd')].

Method 1: Using OrderedDict

This method involves utilizing the OrderedDict from the collections module. The OrderedDict remembers the order in which keys were inserted. By treating the lists as keys, we can leverage this feature to remove duplicates while preserving order.

Here’s an example:

from collections import OrderedDict

def remove_duplicates(tuples_list):
    unique_tuples = OrderedDict()
    for lst, val in tuples_list:
        unique_tuples.setdefault(tuple(lst), val)
    return list((list(k), v) for k,v in unique_tuples.items())

test_data = [([1, 2], 'a'), ([3, 4], 'b'), ([1, 2], 'c'), ([5], 'd')]
result = remove_duplicates(test_data)
print(result)

Output: [([1, 2], 'a'), ([3, 4], 'b'), ([5], 'd')]

This code snippet defines a function remove_duplicates() that iterates over the given list of tuples, using each list as a key in an OrderedDict. The function preserves the first occurrence of each list and associated value, and then converts the keys back to lists in the final result.

Method 2: Using a Set for Seen Lists

The set-based approach tracks already seen lists with a Python set. This is efficient since sets offer constant-time membership tests. We create a new list only adding entries that haven’t been seen before.

Here’s an example:

def remove_duplicates(tuples_list):
    seen = set()
    unique_tuples = []
    for lst, val in tuples_list:
        t_lst = tuple(lst)
        if t_lst not in seen:
            seen.add(t_lst)
            unique_tuples.append((lst, val))
    return unique_tuples

test_data = [([1, 2], 'a'), ([3, 4], 'b'), ([1, 2], 'c'), ([5], 'd')]
result = remove_duplicates(test_data)
print(result)

Output: [([1, 2], 'a'), ([3, 4], 'b'), ([5], 'd')]

The function remove_duplicates() iterates each tuple in the list, converting lists to tuples for immutability to store in a set. If not seen before, it adds the tuple form of the list to the set and the original tuple to the result list.

Method 3: List Comprehension with Set

We can condense Method 2 using a list comprehension. The idea remains the same—using a set for membership checks—but the implementation become more concise and Pythonic.

Here’s an example:

def remove_duplicates(tuples_list):
    seen = set()
    return [(lst, val) for lst, val in tuples_list if not (t_lst := tuple(lst)) in seen and not seen.add(t_lst)]

test_data = [([1, 2], 'a'), ([3, 4], 'b'), ([1, 2], 'c'), ([5], 'd')]
result = remove_duplicates(test_data)
print(result)

Output: [([1, 2], 'a'), ([3, 4], 'b'), ([5], 'd')]

The one-liner inside remove_duplicates() uses a list comprehension that checks for the presence of each list converted to a tuple in the seen set, and adds it if not present. The := operator (walrus operator) is used to both check and add to seen in a single expression.

Method 4: Custom Function with Generator

This method creates a custom generator function that yields unique elements. It allows for lazy evaluation which can be efficient if the dataset is very large and only a part of it is needed at someone’s time.

Here’s an example:

def unique_generator(tuples_list):
    seen = set()
    for lst, val in tuples_list:
        t_lst = tuple(lst)
        if t_lst not in seen:
            seen.add(t_lst)
            yield lst, val

test_data = [([1, 2], 'a'), ([3, 4], 'b'), ([1, 2], 'c'), ([5], 'd')]
result = list(unique_generator(test_data))
print(result)

Output: [([1, 2], 'a'), ([3, 4], 'b'), ([5], 'd')]

The unique_generator() function is a generator that iterates over the list of tuples, checking if each list has been seen, adding unseen ones to the set, and yielding unique tuples. This allows for iteration over unique elements without creating a full list in memory.

Bonus One-Liner Method 5: List Comprehension with Function Call

An even more pythonic one-liner can be written by embedding the membership test and addition to the seen set within a function, which is then called inside a list comprehension.

Here’s an example:

def remove_duplicates(tuples_list):
    seen = set()
    seen_add = seen.add
    return [(lst, val) for lst, val in tuples_list if not (t_lst := tuple(lst)) in seen and not seen_add(t_lst)]

test_data = [([1, 2], 'a'), ([3, 4], 'b'), ([1, 2], 'c'), ([5], 'd')]
result = remove_duplicates(test_data)
print(result)

Output: [([1, 2], 'a'), ([3, 4], 'b'), ([5], 'd')]

The function remove_duplicates() features a list comprehension with a local variable seen_add that stores the add method of the set. It is used within the comprehension for a slight performance gain by bypassing attribute lookup.

Summary/Discussion

Method 1: Using OrderedDict. Preserves order due to the nature of OrderedDict. Can be slower for very large datasets due to the overhead of maintaining order.
Method 2: Using a Set for Seen Lists. Efficient due to constant-time membership testing. Order is preserved by using a list to collect output. More verbose than a list comprehension.
Method 3: List Comprehension with Set. More concise and Pythonic due to list comprehension. Order is preserved, and efficiency is equivalent to method 2.
Method 4: Custom Function with Generator. Suitable for very large or streaming datasets due to lazy evaluation. Preserves memory but requires conversion to list for full output.
Bonus One-Liner Method 5: List Comprehension with Function Call. Most Pythonic and possibly slightly faster due to local function reference. However, the use of the walrus operator requires Python 3.8+.