5 Best Ways to Find Common Words in Two Strings in Python

Rate this post

πŸ’‘ Problem Formulation: Imagine needing to compare two textual documents or strings to extract the common vocabulary. For example, given two strings, “apple orange banana” and “banana kiwi orange”, we wish to output a set or list of the words they share, in this case: “orange” and “banana”. This article provides solutions for identifying commonalities in text data, which is fundamental in text processing tasks such as document comparison, plagiarism detection, or data deduplication.

Method 1: Using Set Intersection

This method involves converting both strings into sets of words and then finding the intersection. The split() function breaks down the strings into words, and the set data structure stores unique words. The intersection of sets contains only the common elements between the two sets.

Here’s an example:

str1 = "apple orange banana"
str2 = "banana kiwi orange"
common_words = set(str1.split()) & set(str2.split())
print(common_words)

Output:

{'banana', 'orange'}

This code snippet demonstrates how to find common words efficiently by utilizing Python’s built-in set operations. By converting the strings into sets, we easily determine overlapping words with a single intersection operation.

Method 2: Using List Comprehension

With list comprehension, we can traverse each word in the first string and check if it also exists in the second string. This method is more “Pythonic” and can be preferable for its readability and concise syntax, especially useful when dealing with small sets of data.

Here’s an example:

str1 = "apple orange banana"
str2 = "banana kiwi orange"
common_words = [word for word in str1.split() if word in str2.split()]
print(common_words)

Output:

['orange', 'banana']

The list comprehension iterates through the first list of words and includes a word in the resultant list if it is also found in the second list. This approach is direct and straightforward but may be less efficient for large datasets due to repeated list traversal.

Method 3: Using a Counter

The collections.Counter class in Python can be used to count occurrences of each word in both strings combined. The common elements can be found by checking which words have a count greater than one. It’s a good method when you have duplicated words and want to ensure the number of occurrences match.

Here’s an example:

from collections import Counter

str1 = "apple orange banana"
str2 = "banana kiwi orange"
counter = Counter(str1.split()) + Counter(str2.split())
common_words = [word for word, count in counter.items() if count > 1]
print(common_words)

Output:

['orange', 'banana']

In this snippet, we are using Counter from Python’s collections module to tally the occurrence of each word and then filter out the words that occur more than once. This method is particularly useful when dealing with multiple occurrences of words and provides a frequency-based approach to the problem.

Method 4: Using Filter and Lambda Function

Python’s filter function can be combined with a lambda function to iterate over one list and filter out non-common words. It’s a more functional programming approach and can be used as an alternative to list comprehensions.

Here’s an example:

str1 = "apple orange banana"
str2 = "banana kiwi orange"
common_words = list(filter(lambda word: word in str2.split(), str1.split()))
print(common_words)

Output:

['orange', 'banana']

This code uses filter with a lambda function to check for common words. Here, the lambda function returns True for words that are present in both string 1 and string 2, effectively filtering the non-common words out of the first list.

Bonus One-Liner Method 5: Using Intersection with Set Comprehensions

Python’s set comprehensions provide a quick one-liner to find common words by constructing sets inline and performing an intersection in a single expression. This is convenient for concise code writing and combines the benefits of set operations with the readability of comprehensions.

Here’s an example:

str1 = "apple orange banana"
str2 = "banana kiwi orange"
common_words = {word for word in str1.split()} & {word for word in str2.split()}
print(common_words)

Output:

{'orange', 'banana'}

With set comprehensions, this snippet instantly creates two sets from the words of each string and gets their intersection, providing an expressive and efficient way to find common elements.

Summary/Discussion

  • Method 1: Set Intersection. This method is straightforward and efficient for finding common words. It works best when duplicate words are not an issue and you need a fast, set-based solution.
  • Method 2: List Comprehension. The method is readable and concise but can be computationally intensive if the lists are large because it requires repeated traversal.
  • Method 3: Using a Counter. This is best for handling duplicates and getting word frequencies, but it may be overkill for simple commonality checks and is less efficient than using sets.
  • Method 4: Filter and Lambda Function. This method leverages functional programming principles, which could be a strength or weakness depending on your familiarity with functional idioms.
  • Method 5: Set Comprehensions Intersection. Extremely concise and efficient, this one-liner is best for quick operations in an interactive environment. However, it’s often less readable for those unfamiliar with comprehensions.