5 Best Ways to Find the Largest Group of Anagrams from a Word List in Python

Rate this post

πŸ’‘ Problem Formulation: Given a list of words, the challenge is to find the largest subset of words that are anagrams of each other. An anagram is a word formed by rearranging the letters of another word, e.g., ‘listen’ and ‘silent’. We aim for a function where the input is ['cat', 'dog', 'tac', 'god', 'act'] and the desired output would be ['cat', 'tac', 'act'].

Method 1: Using defaultdict from collections

This method utilizes the defaultdict class from the collections module to group anagrams efficiently. By sorting the letters of each word and using the resulting string as a key, we can group anagrams together in a dictionary where each key corresponds to a list of words (anagrams).

Here’s an example:

from collections import defaultdict

def find_largest_anagram_group(words):
    anagrams = defaultdict(list)
    for word in words:
        anagrams[''.join(sorted(word))].append(word)
    return max(anagrams.values(), key=len)

words = ['cat', 'dog', 'tac', 'god', 'act']
print(find_largest_anagram_group(words))

The output of this code snippet is: ['cat', 'tac', 'act']

This code creates a defaultdict object that automatically initializes a new list for each new key. Every word from the input list is sorted to form a key, under which the original word is stored. Finally, the largest group of anagrams is returned by finding the value with the maximum length.

Method 2: Using Counter from collections

The Counter class from the collections module can be used to count the occurrences of each character in a word. This count can act as a key for grouping anagrams, as anagrams will have identical character counts.

Here’s an example:

from collections import defaultdict, Counter

def find_largest_anagram_group(words):
    anagrams = defaultdict(list)
    for word in words:
        anagrams[tuple(Counter(word).items())].append(word)
    return max(anagrams.values(), key=len)

words = ['cat', 'dog', 'tac', 'god', 'act']
print(find_largest_anagram_group(words))

The output of this code snippet is: ['cat', 'tac', 'act']

This approach constructs a Counter for each word, converting it into a tuple that serves as a unique key in the defaultdict for anagrams. The largest grouping is found in the same way as the previous method.

Method 3: Grouping with a Hash Function

A custom hash function can be used to encode word signatures. The signature of an anagram group is determined by the individual letter counts, which makes it an effective hashing strategy for grouping anagrams.

Here’s an example:

def hash_word(word):
    return ''.join(sorted(word))

def find_largest_anagram_group(words):
    anagrams = {}
    for word in words:
        word_hash = hash_word(word)
        if word_hash in anagrams:
            anagrams[word_hash].append(word)
        else:
            anagrams[word_hash] = [word]
    return max(anagrams.values(), key=len)

words = ['cat', 'dog', 'tac', 'god', 'act']
print(find_largest_anagram_group(words))

The output of this code snippet is: ['cat', 'tac', 'act']

This code defines a hash function that generates a signature by sorting the letters of a word. It then uses a standard dictionary to store lists of anagrams, keyed by their common hash. The largest group is then retrieved as the one with the most elements.

Method 4: Sorting and Tuple Conversion

Another intuitive way is to directly sort each word’s letters and convert them to tuples, which are hashable and can serve as effective dictionary keys for grouping anagrams.

Here’s an example:

def find_largest_anagram_group(words):
    anagrams = {}
    for word in words:
        key = tuple(sorted(word))
        if key not in anagrams:
            anagrams[key] = [word]
        else:
            anagrams[key].append(word)
    return max(anagrams.values(), key=len)

words = ['cat', 'dog', 'tac', 'god', 'act']
print(find_largest_anagram_group(words))

The output of this code snippet is: ['cat', 'tac', 'act']

By converting the sorted letter list of each word into a tuple, this method uses a simple dictionary to find and return the largest grouping of anagrams.

Bonus One-Liner Method 5: Using a Generator Expression with max()

The most concise method leverages a generator expression inside the max() function, combined with Python’s powerful list comprehensions, to identify the largest group of anagrams in a single line of code.

Here’s an example:

from itertools import groupby

words = ['cat', 'dog', 'tac', 'god', 'act']
largest_anagrams = max((list(group) for _, group in groupby(sorted(words, key=sorted), key=sorted)), key=len)
print(largest_anagrams)

The output of this code snippet is: ['act', 'cat', 'tac']

This one-liner sorts the words based on their sorted characters, groups them by anagrams using groupby(), and finally selects the largest group. Note that groupby() is sensitive to the order of input items, hence words need to be sorted first.

Summary/Discussion

  • Method 1: Using defaultdict from collections. This is a clear and efficient approach for grouping anagrams. However, it relies on the elements being hashable.
  • Method 2: Using Counter from collections. Counter objects provide a neat way to handle letter frequencies but may be overkill for simple anagram grouping and less efficient than sorting.
  • Method 3: Grouping with a Hash Function. Custom hash functions are flexible and can be optimized according to the use case, but they can also add complexity to the code.
  • Method 4: Sorting and Tuple Conversion. This method is straightforward and very readable but can be slower due to tuple creation for each word.
  • Bonus One-Liner Method 5: Using a Generator Expression with max(). This approach is elegant and concise but can be difficult to understand and debug for some users. It also might not be as performant with large datasets.