5 Best Ways to Implement UNIX Filename Pattern Matching in Python with fnmatch

πŸ’‘ Problem Formulation: When working in a UNIX-like environment or dealing with file systems, it is common to encounter the need for filename pattern matching. This could involve, for example, finding all files with the ‘.txt’ extension in a directory. Users often desire a simple yet powerful method to filter filenames that match a particular patternβ€”using glob-like rules as seen in shell scripting. We seek Pythonic methods to achieve this, where input includes filenames, and output is a filtered list matching our criteria.

Method 1: Using fnmatch.fnmatch

This method utilizes the fnmatch.fnmatch function from the fnmatch module which compares a single filename to a pattern and returns a boolean indicating a match. The patterns follow the same rules as the UNIX shell.

Here’s an example:

import fnmatch
import os

# List of filenames
filenames = ['data1.txt', 'data2.csv', 'image.png', 'report.txt', 'summary.pdf']

# Pattern to match
pattern = '*.txt'

# Filtering filenames
matched_files = [f for f in filenames if fnmatch.fnmatch(f, pattern)]

print(matched_files)

Output:

['data1.txt', 'report.txt']

This code snippet iterates over a list of filenames and applies the fnmatch.fnmatch function to each filename with the given pattern. Files ending with ‘.txt’ are matched and added to the matched_files list, which is printed as the output.

Method 2: Using fnmatch.filter

The fnmatch.filter function takes a list of filenames and a pattern, returning a list of filenames that match the pattern. Unlike fnmatch.fnmatch, filter processes a list of names and is typically more concise for bulk operations.

Here’s an example:

import fnmatch

# List of filenames
filenames = ['data1.txt', 'data2.csv', 'image.png', 'report.txt', 'summary.pdf']

# Pattern to match
pattern = '*.txt'

# Get matched filenames
matched_files = fnmatch.filter(filenames, pattern)

print(matched_files)

Output:

['data1.txt', 'report.txt']

This code snippet uses fnmatch.filter to directly obtain the list of filenames that match the provided UNIX shell-style pattern. No need for a list comprehension.

Method 3: Using fnmatch with Case Sensitivity

The fnmatch.fnmatch function can also be used in a case-sensitive manner by setting the fnmatch.FNM_CASEFOLD flag. It is particularly useful when dealing with case-sensitive filesystems.

Here’s an example:

import fnmatch

# List of filenames
filenames = ['README.TXT', 'setup.py', 'INSTALL.MD', 'config.cfg']

# Pattern to match, ignoring case
pattern = '*.txt'

# Filtering filenames with case insensitivity
matched_files = [f for f in filenames if fnmatch.fnmatch(f, pattern, flags=fnmatch.FNM_CASEFOLD)]

print(matched_files)

Output:

['README.TXT']

This snippet demonstrates case-insensitive matching. It is helpful when the filesystem or conventions do not enforce a strict case policy.

Method 4: Combining fnmatch with os.walk for Directory Traversal

The os.walk function can be combined with fnmatch to apply pattern matching to files within a directory tree, recursively. This method is useful for more complex search operations across directories.

Here’s an example:

import fnmatch
import os

# Directory to start search
search_dir = '.'

# Pattern to match
pattern = '*.py'

# Recursive search with pattern matching
matched_files = []
for dirpath, dirnames, files in os.walk(search_dir):
    for filename in fnmatch.filter(files, pattern):
        matched_files.append(os.path.join(dirpath, filename))

print(matched_files)

Output:

['./setup.py', './scripts/run.py']

This code snippet recursively traverses the directory tree starting from ‘.’, matches files ending with ‘.py’, and appends the matches to the matched_files list with their corresponding path.

Bonus One-Liner Method 5: List Comprehension with glob.glob

A more concise approach can be achieved by using the glob.glob method for pattern matching. It finds all the pathnames matching a specified pattern according to the rules used by the UNIX shell. This can be seen as a combination of fnmatch and directory traversal logic in a single call.

Here’s an example:

import glob

# Pattern to match
pattern = '*.py'

# One-liner to get all matched files
matched_files = glob.glob(pattern)

print(matched_files)

Output:

['setup.py', 'app.py', 'test.py']

This example shows how to use glob.glob to quickly get a list of all Python files in the current directory with minimal code.

Summary/Discussion

  • Method 1: fnmatch.fnmatch. Best for single filename evaluation. Simple to use for individual checks but requires iteration for lists of files.
  • Method 2: fnmatch.filter. Efficient for bulk pattern matching on lists. Directly returns matched filenames which could be more performant than manual list comprehension.
  • Method 3: fnmatch with Case Sensitivity. Useful when working with case-insensitive patterns. Adds flexibility but often overlooked.
  • Method 4: fnmatch with os.walk for Recursive Directory Traversal. Powerful for complete searches within directory trees. More complex and not required for simple file list filtering.
  • Method 5: glob.glob. A blend of fnmatch pattern matching with the simplicity of glob‘s filesystem navigation. Ideal for concise pattern matching in directory searches.