How to Count a Specific Word in a Text File in Python?

Problem Formulation

πŸ’‘ Problem Formulation: The goal is to determine how many times a word appears throughout the text.

Given:

  • A text file (example.txt) containing a body of text.
  • A specific word to search for within this text (e.g., "Python").

Goal:

  • Write a Python program that reads the content of example.txt.
  • Counts and returns the number of times the specified word ("Python") appears in the text.
  • The word comparison should be case-insensitive, meaning "Python", "python", and "PYTHON" would all be counted as occurrences of the same word.
  • Words should be considered as sequences of characters separated by whitespace or punctuation marks. For instance, "Python," (with a comma) and "Python" (without a comma) should be treated as the same word.

Example: Consider the text file example.txt with the following content:

πŸ’Ύ example.txt

Python is an interpreted, high-level and general-purpose programming language. Python's design philosophy emphasizes code readability with its notable use of significant whitespace. Its language constructs and object-oriented approach aim to help programmers write clear, logical code for small and large-scale projects.
Python is dynamically typed and garbage-collected. It supports multiple programming paradigms, including structured (particularly, procedural), object-oriented, and functional programming. Python is often described as a "batteries included" language due to its comprehensive standard library.

If the word to search for is "Python", the program should output a count of 5, as the word "Python" (in various cases) appears five times in the text.

Method 1: Using the split() Function

The simplest way to count a specific word in a text file is by reading the file’s content into a string, converting it to lowercase (to make the search case-insensitive), and then using the split() function to break the string into words. After that, you can use the count() method to find the occurrences of the specified word.

def count_word_in_file(file_path, word):
    with open(file_path, 'r') as file:
        text = file.read().lower()
    words = text.split()
    return words.count(word.lower())

print(count_word_in_file('example.txt', 'Python'))

This code opens the file example.txt in read mode, reads its content, and converts it into lowercase. Then, it splits the content into a list of words and counts how many times the specified word appears in the list.

Method 2: Using Regular Expressions

For more control over what constitutes a word (e.g., ignoring punctuation), you can use the re module. This approach allows you to define a word more accurately by using regular expressions.

import re

def count_word_in_file_regex(file_path, word):
    with open(file_path, 'r') as file:
        text = file.read().lower()
    word_pattern = fr'\b{re.escape(word.lower())}\b'
    return len(re.findall(word_pattern, text))

print(count_word_in_file_regex('example.txt', 'Python'))

Here, the re.findall() function searches for all non-overlapping occurrences of the specified word, considering word boundaries (\b), making it more accurate for word matching. re.escape() is used to escape the word, making sure it’s treated as a literal string in the regular expression.

Method 3: Using the collections.Counter Class

The collections module provides a Counter class that can be extremely useful for counting word frequencies in a text. This method involves reading the text, splitting it into words, and then passing the list of words to Counter to get a dictionary-like object where words are keys and their counts are values.

from collections import Counter
import re

def count_word_in_file_counter(file_path, word):
    with open(file_path, 'r') as file:
        text = file.read().lower()
    words = re.findall(r'\b\w+\b', text)
    word_counts = Counter(words)
    return word_counts[word.lower()]

print(count_word_in_file_counter('example.txt', 'Python'))

This method uses regular expressions to split the text into words in a way that excludes punctuation. Then, it uses Counter to count occurrences of each word. Finally, it returns the count of the specified word.

Method 4: Using a Loop and Dictionary

If you want to avoid importing any additional modules, you can manually count occurrences of each word using a loop and a dictionary. This method provides a good understanding of how word counting works under the hood.

def count_word_in_file_dict(file_path, word):
    word_counts = {}
    with open(file_path, 'r') as file:
        for line in file:
            for word in line.lower().split():
                word_counts[word] = word_counts.get(word, 0) + 1
    return word_counts.get(word.lower(), 0)

print(count_word_in_file_dict('example.txt', 'Python'))

This code reads the file line by line, splits each line into words, and uses a dictionary to keep track of word counts. The get() method is used to update counts, providing a default of 0 if the word isn’t already in the dictionary.

Method 5: Using the pandas Library

For those who are working with data analysis, the pandas library can be a powerful tool for text processing. This method involves reading the entire file into a pandas DataFrame and then using pandas methods to count the word occurrences.

import pandas as pd

def count_word_in_file_pandas(file_path, word):
    df = pd.read_csv(file_path, sep='\t', header=None)
    all_words = pd.Series(df[0].str.cat(sep=' ').lower().split())
    return all_words[all_words == word.lower()].count()

print(count_word_in_file_pandas('example.txt', 'Python'))

This code reads the text file as if it were a CSV file with a single column, concatenates all lines into a single string, splits this string into words, and then counts the occurrences of the specified word using pandas Series methods.

Bonus One-Liner Method 6: Using Path and List Comprehension

For a succinct approach, you can combine the Path object from the pathlib module with list comprehension. This one-liner is efficient and Pythonic.

from pathlib import Path

def count_word_in_file_oneliner(file_path, word):
    return Path(file_path).read_text().lower().split().count(word.lower())

print(count_word_in_file_oneliner('example.txt', 'Python'))

This method reads the file content as a string, lowers its case, splits it into words, and counts the occurrences of the specified word, all in one line.