5 Best Ways to Remove Duplicate Entries in a Linked List in Python

💡 Problem Formulation: When working with linked lists in Python, a common problem is the presence of duplicate nodes containing the same data value. This can be problematic in applications that require uniqueness of elements. For example, given a linked list 1 -> 3 -> 3 -> 2 -> 1, the goal is to transform it into 1 -> 3 -> 2 by removing duplicated entries.

Method 1: Using Hashtable Tracking

This method involves iterating through the linked list while keeping track of seen values in a hashtable (e.g., a Python dictionary). If a node with a value already seen is encountered, it’s removed from the list. It guarantees that each element will remain in the linked list only once, thus removing duplicates.

Here’s an example:

class Node:
    def __init__(self, data):
        self.data = data
        self.next = None

def remove_duplicates(head):
    seen = set()
    current = head
    previous = None
    while current:
        if current.data in seen:
            previous.next = current.next
        else:
            seen.add(current.data)
            previous = current
        current = current.next
    return head

Output of this code will be the linked list without duplicates.

This code snippet defines a Node class for our linked list and a remove_duplicates function. The function uses a set to keep track of unique elements and removes nodes with duplicate data. The previous pointer is critical for bypassing the unwanted nodes.

Method 2: Nested Iteration Comparison

Without using additional space, this method utilizes two pointers. The first pointer iterates through the list, while the second pointer checks all subsequent nodes for duplicates, removing any it finds. It’s less efficient but does not use extra memory beyond the two pointers.

Here’s an example:

def remove_duplicates(head):
    current = head
    while current:
        runner = current
        while runner.next:
            if runner.next.data == current.data:
                runner.next = runner.next.next
            else:
                runner = runner.next
        current = current.next
    return head

Output of this code will be the linked list without duplicates.

This code snippet uses a two-pointer approach. The current pointer goes through each node, while runner checks for duplicates ahead in the list. When duplicates are found, they are skipped over, effectively removing them from the list.

Method 3: Sort and Remove

If the list can be sorted, duplicates will be adjacent and easy to remove in a single pass. This method sorts the list first and then simply compares each node to its successor, removing any duplicates detected.

Here’s an example:

# Note: This assumes that the linked list has a sorting method implemented.
def remove_duplicates_sorted(head):
    if not head:
        return None
    current = head
    while current.next:
        if current.data == current.next.data:
            current.next = current.next.next
        else:
            current = current.next
    return head

Output of this code would result in a sorted linked list without duplicates.

After sorting the list, the remove_duplicates_sorted function iterates through it. Since duplicates are now adjacent, the function can easily remove them by setting the next pointer of a node to skip the duplicate.

Method 4: Recursive Deduplication

This approach uses recursion to remove duplicates from a linked list. Starting from the head, it removes all nodes of the same value as the head, then applies the same logic recursively to the remaining list.

Here’s an example:

def remove_duplicates(head):
    if not head or not head.next:
        return head
    head.next = remove_duplicates(head.next)
    return head.next if head.next and head.data == head.next.data else head

Output of this code would be a linked list with duplicates removed.

The recursive function works by removing duplicates immediately following the head node and then proceeding to do the same for the rest of the list recursively. This is a succinct solution but may lead to a stack overflow for large lists.

Bonus One-Liner Method 5: Using Collections Counter (Not recommended for actual linked lists)

This method is not practical for linked list data structures but can demonstrate how Python’s built-in libraries can be utilized to remove duplicates from a collection. It uses the collections.Counter to count occurrences, then reconstructs the list with only unique elements.

Here’s an example:

from collections import Counter

def remove_duplicates(lst):
    return list(Counter(lst).keys())

Output of this code will be [1, 3, 2] for the example list.

This one-liner approach simplifies the process drastically. The Counter creates a hashtable of all list elements, and converting the keys back into a list removes duplicates. This, however, assumes a regular Python list and not a true linked list structure.

Summary/Discussion

Method 1: Hashtable Tracking. Strengths: Efficient and simple. Weaknesses: Requires extra space for the hashtable.
Method 2: Nested Iteration Comparison. Strengths: No extra space needed. Weaknesses: Time complexity of O(n^2) makes it less efficient for large lists.
Method 3: Sort and Remove. Strengths: Simple approach if list can be sorted. Weaknesses: Sorting can be inefficient and alters the original list order.
Method 4: Recursive Deduplication. Strengths: Conceptually simple and elegant. Weaknesses: Potentially high stack memory usage and risk of stack overflow.
Bonus Method 5: Using Collections Counter. Strengths: Very concise. Weaknesses: Not applicable to actual linked list data structures and misrepresents the algorithm’s scalability and applicability.