5 Effective Python Programs to Count Word Frequencies Using Dictionaries

Rate this post

πŸ’‘ Problem Formulation: Efficiently determining the frequency of each word in a string is a common task in text analysis. For an input string like “apple banana apple orange apple grape”, the desired output would be a dictionary such as {'apple': 3, 'banana': 1, 'orange': 1, 'grape': 1}, where each dictionary key represents a unique word and the corresponding value represents the frequency of that word in the string.

Method 1: Using a Regular Dictionary

This traditional method involves iterating over a list of words and incrementing the count for each word in a standard dictionary. To accurately count the words, the input string is first sanitized and split into words. The get() method is used to handle the insertion and updating of word counts.

Here’s an example:

text = "apple banana apple orange apple grape"
word_counts = {}
for word in text.split():
    word_counts[word] = word_counts.get(word, 0) + 1
print(word_counts)

The output of this code snippet:

{'apple': 3, 'banana': 1, 'orange': 1, 'grape': 1}

First, we split the string into a list of words. Then we iterate through each word, using the get() method to check if the word is already in the dictionary and to set a default value of zero if it’s not. The word’s count is then incremented by one.

Method 2: Using collections.defaultdict

The defaultdict type from the collections module eliminates the need to check if a word is already a key in the dictionary. It provides a default value for new keys, which, in this case, is set to int(), which defaults to zero.

Here’s an example:

from collections import defaultdict
text = "apple banana apple orange apple grape"
word_counts = defaultdict(int)
for word in text.split():
    word_counts[word] += 1
print(dict(word_counts))

The output of this code snippet:

{'apple': 3, 'banana': 1, 'orange': 1, 'grape': 1}

Using defaultdict we don’t need to use get() to handle new words. Each word is simply added to the defaultdict and counted without additional checks, making this slightly cleaner and more efficient.

Method 3: Using collections.Counter

Python’s collections module has the Counter class specifically designed for counting hashable objects – perfect for our use case of counting words. It not only simplifies the code but also provides a more readable and efficient solution.

Here’s an example:

from collections import Counter
text = "apple banana apple orange apple grape"
word_counts = Counter(text.split())
print(word_counts)

The output of this code snippet:

Counter({'apple': 3, 'banana': 1, 'orange': 1, 'grape': 1})

In this snippet, we use the Counter class to automatically create a dictionary where the keys are words and the values are their counts. This eliminates the loop needed in other methods, making the code more concise and efficient.

Method 4: Using Python’s Dictionary Comprehension

Dictionary Comprehension in Python can be used for creating a dictionary that represents word frequency by combining it with the set function (to avoid repeated counting) and the list.count() method to count occurrences.

Here’s an example:

text = "apple banana apple orange apple grape"
word_list = text.split()
word_counts = {word: word_list.count(word) for word in set(word_list)}
print(word_counts)

The output of this code snippet:

{'banana': 1, 'orange': 1, 'apple': 3, 'grape': 1}

This snippet illustrated dictionary comprehension with the set function to create a unique set of words. Iteration is done only once for each unique word and its count is determined by the list.count() method.

Bonus One-Liner Method 5: Using map() and dict()

A one-liner solution that employs a combination of map() to apply the counting process and dict() to convert the resulting iterable of tuples into a dictionary. This method is very concise but might sacrifice some readability.

Here’s an example:

text = "apple banana apple orange apple grape"
word_counts = dict(map(lambda word: (word, text.split().count(word)), set(text.split())))
print(word_counts)

The output of this code snippet:

{'banana': 1, 'orange': 1, 'apple': 3, 'grape': 1}

The snippet uses lambda inside the map() function to create a tuple for each unique word and its frequency from the split string. These tuples are then converted into a dictionary with the dict() constructor.

Summary/Discussion

  • Method 1: Using a Regular Dictionary. Strength: Straightforward and simple. Weakness: Requires manual handling of word existence checks.
  • Method 2: Using collections.defaultdict. Strength: More readable and more efficient than a standard dictionary. Weakness: Introduces an external module dependency.
  • Method 3: Using collections.Counter. Strength: Explicitly designed for counting, thus efficient and very readable. Weakness: Depends on collections module, but it’s a standard library in Python, so it’s not significant.
  • Method 4: Using Python’s Dictionary Comprehension. Strength: Compact and Pythonic. Weakness: Potentially less efficient due to use of list.count() inside a comprehension.
  • Bonus Method 5: Using map() and dict(). Strength: Very concise one-liner method. Weakness: Possibly less readable and not as performant when dealing with large datasets.