5 Best Ways to Categorize a List by String Size in Python

💡 Problem Formulation: When working with lists in Python, a common task is to sort or categorize the elements based on certain criteria. Specifically, we might want to group strings based on their length. For example, given a list ['apple', 'kiwi', 'banana', 'pear', 'grape'], our goal is to categorize this list into a dictionary where each key represents the string length and the associated value is a list of strings of that length. The expected output would be {5: ['apple', 'grape'], 4: ['kiwi', 'pear'], 6: ['banana']}.

Method 1: Using defaultdict from Collections

Using defaultdict from the collections module is an efficient way to group strings by their size, as it simplifies the handling of non-existing keys. Upon accessing a missing key, defaultdict automatically initializes it with a default value (in this case, an empty list).

Here’s an example:

from collections import defaultdict

def categorize_by_length(words):
    length_dict = defaultdict(list)
    for word in words:
        length_dict[len(word)].append(word)
    return length_dict

fruits = ['apple', 'kiwi', 'banana', 'pear', 'grape']
print(categorize_by_length(fruits))

Output:

{5: ['apple', 'grape'], 4: ['kiwi', 'pear'], 6: ['banana']}

In this snippet, we define a function categorize_by_length that takes a list of words as an argument. It creates a defaultdict with lists as default values, then iterates through each word, appending it to the correct list based on the word’s length. This method simplifies the code by automating the creation of keys and lists.

Method 2: Using groupby from itertools

The itertools.groupby function is a powerful tool for grouping items in an iterable by a specified key function. However, it is important to sort the list by the key function first because groupby only groups adjacent items.

Here’s an example:

from itertools import groupby

fruits = ['apple', 'kiwi', 'banana', 'pear', 'grape']
fruits.sort(key=len)
length_group = {k: list(g) for k, g in groupby(fruits, key=len)}

print(length_group)

Output:

{4: ['kiwi', 'pear'], 5: ['apple', 'grape'], 6: ['banana']}

This code first sorts the fruits list by the length of its elements. Then, it creates a dictionary comprehension that uses groupby to iterate over the sorted list and group the elements by length, forming the desired categorization.

Method 3: Using a Simple For Loop and Dictionary

Without using any additional libraries, we can categorize a list by using a simple for loop along with a standard dictionary. This method is straightforward and easy to understand.

Here’s an example:

fruits = ['apple', 'kiwi', 'banana', 'pear', 'grape']
length_dict = {}

for word in fruits:
    if len(word) in length_dict:
        length_dict[len(word)].append(word)
    else:
        length_dict[len(word)] = [word]

print(length_dict)

Output:

{5: ['apple', 'grape'], 4: ['kiwi', 'pear'], 6: ['banana']}

This snippet involves iterating over each item in the fruits list, checking if its length is already a key in the dictionary. If the key exists, the item is appended to the value list; if not, a new key-value pair is created with the item inside a new list.

Method 4: Using a Lambda Function and the reduce Method

The reduce method from functools can combine elements of an iterable in a cumulative way. By providing a lambda function that performs the categorization, we can achieve our goal in a single statement.

Here’s an example:

from functools import reduce

fruits = ['apple', 'kiwi', 'banana', 'pear', 'grape']
length_dict = reduce(lambda d, w: (d[len(w)].append(w) or d) if len(w) in d else d.update({len(w): [w]}) or d, fruits, {})

print(length_dict)

Output:

{5: ['apple', 'grape'], 4: ['kiwi', 'pear'], 6: ['banana']}

In this code, we define a lambda function that takes a dictionary and a word. It categorizes the word by length inside the dictionary, initializing new keys as needed. This is combined with reduce to apply the function across the list of fruits and accumulates the result in length_dict.

Bonus One-Liner Method 5: Using List Comprehension and setdefault

A concise one-liner solution to categorize strings by size employs list comprehension and the dict.setdefault method to initialize dictionary keys with default values.

Here’s an example:

fruits = ['apple', 'kiwi', 'banana', 'pear', 'grape']
length_dict = {}
[length_dict.setdefault(len(w), []).append(w) for w in fruits]

print(length_dict)

Output:

{5: ['apple', 'grape'], 4: ['kiwi', 'pear'], 6: ['banana']}

The above one-liner uses list comprehension to iterate over each fruit. For every word, it uses setdefault to ensure there is a list ready in length_dict for the given length, then appends the word to this list. While compact, this method repurposes list comprehension for side effects, which is generally discouraged in Python.

Summary/Discussion

Method 1: Using defaultdict. Strengths: Automatically handles missing keys. Weaknesses: Requires importing a module.
Method 2: Using groupby. Strengths: Elegant and concise. Weaknesses: Requires the list to be sorted beforehand.
Method 3: Using a simple for loop. Strengths: Easy to understand; no imports necessary. Weaknesses: More verbose than other methods.
Method 4: Using reduce with a lambda function. Strengths: Compact and functional. Weaknesses: Can be harder to read and understand.
Bonus Method 5: One-liner using list comprehension. Strengths: Extremely concise. Weaknesses: Abuses list comprehensions for side effects; may be confusing.