Lazy as I am, I name all temporary code files code.py
when I try something in Python. Today, once again, I needed to find one of my code.py files because of a code snippet I was interested in.
Being a Python enthusiast, I wondered:
π How can I find all files that match a certain pattern (e.g., a regular expression) using Python and only Python?
This quick tutorial will show you how I did it — I hope you’ll find some use in it as well.
Find All Files on My Windows Machine That Match Regex
This code uses the glob
library to retrieve a list of all files in the Windows C
directory and its subdirectories that match the query 'code.py'
.
import glob # Get the list of all files matching the query file_list = glob.glob('C:\\**\\code.py', recursive=True) # Print the resulting list print(file_list)
The recursive parameter is set to True
which allows the glob
library to search through all subdirectories. The file_list
variable stores the list of all the files matching the query, and then it is printed out.
As a proof that I have a lot of code.py
files on my computer—here’s the output on my Windows machine:
Ironically, the code itself runs in another code.py
file — haha. π€―π
Now, you may have some questions left:
What Does the Double Asterisk ** Mean?
The double asterisk (**
) is a wildcard that allows for matching across multiple levels of a directory structure. Use it to match any number of subdirectories and/or files in a given directory.
Why Didn’t I Use a Single Asterisk *?
The single asterisk (*
) is used to match a single level of a directory structure. It will only match subdirectories and/or files within the immediate directory. The double asterisk (**
) allows for matching across multiple levels of a directory structure.
How to Match Other Regex Patterns?
The glob
library allows you to use other regex patterns to match files.
You can use the question mark ?
character to match any single character, the [ ]
character class to match any single character in a set of characters, and the { }
curly braces to match a set of multiple characters.
For example, the pattern 'C:\\**\\code[0-9].py'
would match any file ending with 'code'
followed by a single digit and ending with '.py'
.
Why Did I Use The Double Backslash \\?
The double backslash (\\
) is used as an escape character in Python. It indicates that the character following it should be treated as a literal character, rather than as a special character.
In this case, the double backslash indicates that the following characters should be treated as a literal path rather than as a special character.
What To Do On Linux?
On Linux, you can use the same glob
library to match files. However, instead of using the double backslash (\\
) to separate the directory levels, you will need to use the forward slash (/
) instead.
For example, you could use the pattern 'my_dir/**/code.py'
to match all files ending with 'code.py'
in the my_dir
directory and its subdirectories.
What To Do On macOS?
Same as on Linux — see the previous answer. π
Can I Learn More About Regular Expressions?
Sure, check out our full guide here: