5 Best Ways to Perform Unix-Style Pathname Pattern Expansion in Python with glob
π‘ Problem Formulation: When working with files, a common task is to search for files and directories in the file system using pattern matching. Suppose you have a bunch of text files and wish to find all files that follow a specific naming pattern (like ‘report-*.txt’). The desired output is a list of paths to files that match this pattern.
Method 1: Basic Pattern Matching with glob.glob()
The glob.glob() function searches the directory tree for paths matching a specified pattern according to the rules used by Unix shell. It returns a list of pathnames that match the pathname pattern provided.
β₯οΈ Info: Are you AI curious but you still have to create real impactful projects? Join our official AI builder club on Skool (only $5): SHIP! - One Project Per Month
Here’s an example:
import glob
file_list = glob.glob('path/to/reports/report-*.txt')
print(file_list)
Output:
['path/to/reports/report-001.txt', 'path/to/reports/report-002.txt', ...]
This snippet retrieves all ‘.txt’ files in the ‘path/to/reports’ directory that start with ‘report-‘ and lists their full paths.
Method 2: Recursive Pattern Matching with glob.glob() using ‘**’
By adding ‘**’ into the pattern and setting the ‘recursive’ parameter to True, glob.glob() can match files in all directories and subdirectories that fit the pattern.
Here’s an example:
import glob
file_list = glob.glob('path/to/reports/**/*.txt', recursive=True)
print(file_list)
Output:
['path/to/reports/report-001.txt', 'path/to/reports/subdir/report-002.txt', ...]
This code retrieves all ‘.txt’ files in the ‘path/to/reports’ directory and all of its subdirectories recursively.
Method 3: Case-Insensitive Pattern Matching
Python’s glob does not support case-insensitive matching by default. However, you can create a pattern that matches both uppercase and lowercase characters by using character classes [ ] in the pattern string.
Here’s an example:
import glob
file_list = glob.glob('path/to/reports/report-[0-9]*.[Tt][Xx][Tt]')
print(file_list)
Output:
['path/to/reports/report-001.txt', 'path/to/reports/REPORT-002.TXT', ...]
This code fetches all files irrespective of whether the ‘.txt’ extension is in lowercase or uppercase in the specified directory.
Method 4: Using glob.iglob() Iterator
The glob.iglob() function works like glob.glob() but instead of returning a list, it returns an iterator. This is more memory efficient for large sets of results.
Here’s an example:
import glob
for filename in glob.iglob('path/to/reports/report-*.txt'):
print(filename)
Output:
path/to/reports/report-001.txt path/to/reports/report-002.txt ...
This snippet prints each pathname one by one without storing them all simultaneously, suitable for when there are many matching files.
Bonus One-Liner Method 5: List Compression with glob.glob()
You can use list comprehension to immediately use or transform the results provided by glob.glob().
Here’s an example:
import glob
file_list = [filename.upper() for filename in glob.glob('path/to/reports/report-*.txt')]
print(file_list)
Output:
['PATH/TO/REPORTS/REPORT-001.TXT', 'PATH/TO/REPORTS/REPORT-002.TXT', ...]
This line of code changes the pathnames to uppercase after retrieving them, demonstrating inline processing of the results.
Summary/Discussion
- Method 1: Basic Pattern Matching. Simple and widely used. Limited to non-recursive searches.
- Method 2: Recursive Pattern Matching. Comprehensive search capability. May be slower due to recursion.
- Method 3: Case-Insensitive Matching. Useful for environments with uncertain case conventions. Requires manual pattern specification.
- Method 4: Using
glob.iglob(). Memory efficient for processing large numbers of files. Involves iteration rather than direct list manipulation. - Method 5: List Comprehension with
glob.glob(). Efficient inline transformation of results. Less readable when complex operations are involved.
