5 Best Ways to Group Anagrams from a Given List in Python

March 11, 2024 by Emily Rosemary Collins

💡 Problem Formulation: You have a list of words, and your goal is to group the words that are anagrams of one another. An anagram is a word or phrase that can be formed by rearranging the letters of a different word or phrase, typically using all the original letters exactly once. For instance, the input ['bat', 'tab', 'cat', 'act', 'tac'] should yield an output such as [['bat', 'tab'], ['cat', 'act', 'tac']].

Method 1: Using Defaultdict and Sorted

This method involves using a defaultdict from the collections module to group words by the sorted tuple of their characters. This works because anagrams will have identical sorted tuples. It is a simple yet effective way to group anagrams in Python.

Here’s an example:

import collections

def group_anagrams(words):
    anagrams = collections.defaultdict(list)
    for word in words:
        anagrams[tuple(sorted(word))].append(word)
    return list(anagrams.values())

print(group_anagrams(['bat', 'tab', 'cat', 'act', 'tac']))

Output: [['bat', 'tab'], ['cat', 'act', 'tac']]

This code snippet creates a defaultdict where each key is a tuple of sorted letters, and the corresponding value is a list of words that, when sorted, match this tuple. The function returns a list of these groups.

Method 2: Using Counter from Collections

The Counter from the collections module can be used to create a hashable count representation of each word’s letters. This makes it possible to match anagrams by their letter count, a unique solution for words with the same frequency of characters.

Here’s an example:

from collections import Counter

def group_anagrams(words):
    anagrams = collections.defaultdict(list)
    for word in words:
        anagrams[frozenset(Counter(word).items())].append(word)
    return list(anagrams.values())

print(group_anagrams(['bat', 'tab', 'cat', 'act', 'tac']))

Output: [['bat', 'tab'], ['cat', 'act', 'tac']]

In this approach, we create a defaultdict and populate it with a frozenset of counters of each word. Since Counter objects are not hashable by default, we convert them into a frozenset, which serves as the key in our defaultdict.

Method 3: Grouping by Prime Number Multiplication

This method maps each letter to a unique prime number and multiplies these to get a product that will uniquely represent an anagram class. The use of prime numerals ensures that the product will be the same for anagrams and different for non-anagrams.

Here’s an example:

def prime_product(word):
    primes = {'a': 2, 'b': 3, 'c': 5, 'd': 7, 'e': 11, 'f': 13, 'g': 17, 'h': 19, 'i': 23, 
              'j': 29, 'k': 31, 'l': 37, 'm': 41, 'n': 43, 'o': 47, 'p': 53, 'q': 59, 'r': 61, 
              's': 67, 't': 71, 'u': 73, 'v': 79, 'w': 83, 'x': 89, 'y': 97, 'z': 101}
    prod = 1
    for char in word:
        prod *= primes[char]
    return prod

def group_anagrams(words):
    anagrams = collections.defaultdict(list)
    for word in words:
        anagrams[prime_product(word)].append(word)
    return list(anagrams.values())

print(group_anagrams(['bat', 'tab', 'cat', 'act', 'tac']))

Output: [['bat', 'tab'], ['cat', 'act', 'tac']]

Each letter is assigned a unique prime number and the product of these primes for each word is computed using the prime_product function. Anagrams will have the same prime product, thus are grouped together in the defaultdict.

Method 4: Using itertools and sorted()

This method makes use of itertools’ groupby function to group words with identical signatures, where a signature is the sorted word. It is a functional programming-inspired technique that focuses on transforming the list of words to a stream of grouped anagrams.

Here’s an example:

from itertools import groupby

def group_anagrams(words):
    sorted_words = sorted(words, key=sorted)
    grouped = groupby(sorted_words, sorted)
    return [list(group) for _, group in grouped]

print(group_anagrams(['bat', 'tab', 'cat', 'act', 'tac']))

Output: [['bat', 'tab'], ['cat', 'act', 'tac']]

The groupby function is used to group the list of words that have been sorted alphabetically. The sorted words serve as iterators for the groupby function, and the result is a list of lists containing the anagrams.

Bonus One-Liner Method 5: Using List Comprehension and sorted()

A concise one-liner approach is to use list comprehension combined with sorted(). This method makes use of Python’s ability to express complex operations in a compact manner but may be less readable for those unfamiliar with list comprehensions.

Here’s an example:

group_anagrams = lambda words: [list(group) for key, group in itertools.groupby(sorted(words, key=sorted), sorted)]

print(group_anagrams(['bat', 'tab', 'cat', 'act', 'tac']))

Output: [['bat', 'tab'], ['cat', 'act', 'tac']]

This is a one-liner lambda function compressing the functionality of Method 4 into a single statement. While compact, it leverages sorting and grouping in one expression.

Summary/Discussion

Method 1: Using Defaultdict and Sorted. Strengths: Intuitive and straightforward to implement. Weaknesses: Requires additional memory for the dictionary.
Method 2: Using Counter from Collections. Strengths: Efficient for words with many repeating characters. Weaknesses: Sub-optimal for words with unique characters due to Counter overhead.
Method 3: Grouping by Prime Number Multiplication. Strengths: Highly efficient and no sorting required. Weaknesses: Need to define a map between characters and prime numbers, which can become cumbersome for extended alphabets.
Method 4: Using itertools and sorted(). Strengths: Elegant functional programming approach. Weaknesses: Possibly less intuitive for those unfamiliar with itertools.
Method 5: Bonus One-Liner Using List Comprehension and sorted(). Strengths: Extremely concise. Weaknesses: Potentially difficult to read and understand at a glance.