Problem Formulation
π‘ Problem Formulation: The goal is to determine how many times a word appears throughout the text.
Given:
β₯οΈ Info: Are you AI curious but you still have to create real impactful projects? Join our official AI builder club on Skool (only $5): SHIP! - One Project Per Month
- A text file (
example.txt) containing a body of text. - A specific word to search for within this text (e.g.,
"Python").
Goal:
- Write a Python program that reads the content of
example.txt. - Counts and returns the number of times the specified word (
"Python") appears in the text. - The word comparison should be case-insensitive, meaning
"Python","python", and"PYTHON"would all be counted as occurrences of the same word. - Words should be considered as sequences of characters separated by whitespace or punctuation marks. For instance,
"Python,"(with a comma) and"Python"(without a comma) should be treated as the same word.
Example: Consider the text file example.txt with the following content:
πΎ example.txt
Python is an interpreted, high-level and general-purpose programming language. Python's design philosophy emphasizes code readability with its notable use of significant whitespace. Its language constructs and object-oriented approach aim to help programmers write clear, logical code for small and large-scale projects.
Python is dynamically typed and garbage-collected. It supports multiple programming paradigms, including structured (particularly, procedural), object-oriented, and functional programming. Python is often described as a "batteries included" language due to its comprehensive standard library.If the word to search for is "Python", the program should output a count of 5, as the word "Python" (in various cases) appears five times in the text.
Method 1: Using the split() Function
The simplest way to count a specific word in a text file is by reading the file’s content into a string, converting it to lowercase (to make the search case-insensitive), and then using the split() function to break the string into words. After that, you can use the count() method to find the occurrences of the specified word.
def count_word_in_file(file_path, word):
with open(file_path, 'r') as file:
text = file.read().lower()
words = text.split()
return words.count(word.lower())
print(count_word_in_file('example.txt', 'Python'))This code opens the file example.txt in read mode, reads its content, and converts it into lowercase. Then, it splits the content into a list of words and counts how many times the specified word appears in the list.
Method 2: Using Regular Expressions
For more control over what constitutes a word (e.g., ignoring punctuation), you can use the re module. This approach allows you to define a word more accurately by using regular expressions.
import re
def count_word_in_file_regex(file_path, word):
with open(file_path, 'r') as file:
text = file.read().lower()
word_pattern = fr'\b{re.escape(word.lower())}\b'
return len(re.findall(word_pattern, text))
print(count_word_in_file_regex('example.txt', 'Python'))Here, the re.findall() function searches for all non-overlapping occurrences of the specified word, considering word boundaries (\b), making it more accurate for word matching. re.escape() is used to escape the word, making sure it’s treated as a literal string in the regular expression.
Method 3: Using the collections.Counter Class
The collections module provides a Counter class that can be extremely useful for counting word frequencies in a text. This method involves reading the text, splitting it into words, and then passing the list of words to Counter to get a dictionary-like object where words are keys and their counts are values.
from collections import Counter
import re
def count_word_in_file_counter(file_path, word):
with open(file_path, 'r') as file:
text = file.read().lower()
words = re.findall(r'\b\w+\b', text)
word_counts = Counter(words)
return word_counts[word.lower()]
print(count_word_in_file_counter('example.txt', 'Python'))This method uses regular expressions to split the text into words in a way that excludes punctuation. Then, it uses Counter to count occurrences of each word. Finally, it returns the count of the specified word.
Method 4: Using a Loop and Dictionary
If you want to avoid importing any additional modules, you can manually count occurrences of each word using a loop and a dictionary. This method provides a good understanding of how word counting works under the hood.
def count_word_in_file_dict(file_path, word):
word_counts = {}
with open(file_path, 'r') as file:
for line in file:
for word in line.lower().split():
word_counts[word] = word_counts.get(word, 0) + 1
return word_counts.get(word.lower(), 0)
print(count_word_in_file_dict('example.txt', 'Python'))This code reads the file line by line, splits each line into words, and uses a dictionary to keep track of word counts. The get() method is used to update counts, providing a default of 0 if the word isn’t already in the dictionary.
Method 5: Using the pandas Library
For those who are working with data analysis, the pandas library can be a powerful tool for text processing. This method involves reading the entire file into a pandas DataFrame and then using pandas methods to count the word occurrences.
import pandas as pd
def count_word_in_file_pandas(file_path, word):
df = pd.read_csv(file_path, sep='\t', header=None)
all_words = pd.Series(df[0].str.cat(sep=' ').lower().split())
return all_words[all_words == word.lower()].count()
print(count_word_in_file_pandas('example.txt', 'Python'))This code reads the text file as if it were a CSV file with a single column, concatenates all lines into a single string, splits this string into words, and then counts the occurrences of the specified word using pandas Series methods.
Bonus One-Liner Method 6: Using Path and List Comprehension
For a succinct approach, you can combine the Path object from the pathlib module with list comprehension. This one-liner is efficient and Pythonic.
from pathlib import Path
def count_word_in_file_oneliner(file_path, word):
return Path(file_path).read_text().lower().split().count(word.lower())
print(count_word_in_file_oneliner('example.txt', 'Python'))This method reads the file content as a string, lowers its case, splits it into words, and counts the occurrences of the specified word, all in one line.
