Overview
Problem: How to choose a file starting with a given string?
Example: Consider that we have a directory with files as shown below.
How will you select the files starting with “001_Jan
“?
Python Modules Cheat Sheet To Choose A File Starting With A Given String
Choosing a file starting with a given string is easy if you know how to use the Python os
, re
, pathlib
, and the glob
modules. Assume you want to search/select the files starting with '
from a list of files. You can use each module as follows:001_Jan
'
β€OS
import os parent_path = os.listdir("<the folder hosting my-file.txt>") result = [] for file in parent_path: if file.startswith("prefix"): result.append(file) print(result)
β€Re
import os, re parent_path = os.listdir("<the folder hosting my-file.txt>") result = [] for file in parent_path: if re.match('prefix', file): result.append(file) print(result)
β€Glob
from glob import glob result = glob('*prefix*') print(result)
β€Pathlib
from pathlib import Path parent_path = Path('<the folder hosting my-file.txt>/') result = [file.name for file in parent_path.iterdir() if file.name.startswith('prefix')]
Now that you have a quick idea about how to approach the problem let us dive into each solution and find out the mechanism behind each solution.
Method 1: The OS Module
The os
module is the most significant module for working with files and folders in Python. It is primarily designed to access folders and files within your operating system.
Approach: To choose a file starting with a given string within a specific directory, you need to locate the directory containing the required files and then use the startswith()
method to find out all the files which begin with the given string.
Code:
import os parent_path = os.listdir(".") result = [] for file in parent_path: if file.startswith("001_Jan"): result.append(file) print(result)
Output: The result is a list containing the files starting with 001_Jan
.
['001_Jan_Backup_01.txt', '001_Jan_Backup_02.txt', '001_Jan_Backup_03.txt']
Explanation: We are storing the current working directory in the parent_path
variable. We then initialize an empty list, result. Next, we loop through the contents of the parent directory, bookmark the file that starts with ‘001_Jan
‘ and append it to the result list. Finally, we print the result using Python’s print()
function.
['index.html']
Note: startswith()
is a built-in method in Python that returns True
when a string starts with a specified value; otherwise it returns False
.
Solve Using a List Comprehension
You can implement the above solution in a single line with the help of a list comprehension as shown below.
import os result = [filename for filename in os.listdir('.') if filename.startswith("001_Jan")] print(result)
Besides the os
module, we can get the same result using the regular expressions, the glob
, and pathlib
modules, as shown in the following sections.
- Recommended Read:
Method 2: Using Regular Expressions
We can use the re
module to work with regular expressions in Python. Regular expressions are crucial in searching and matching text patterns. We can use methods such as re.compile()
, re.match
with escape characters (. * ^ ? + $ { } [ ] ( ) \ /)
and quantifiers to search strings of texts.
Note:
- The
re.match(pattern, string)
method returns a match object if thepattern
matches at the beginning of thestring
. The match object contains useful information such as the matching groups and the matching positions. An optional argumentflags
allows you to customize the regex engine, for example to ignore capitalization. Read more here. - The
re.findall(pattern, string)
method scansstring
from left to right, searching for all non-overlapping matches of thepattern
. It returns a list of strings in the matching order when scanning the string from left to right. Read more here.
Approach: We can use the re.match()
method as demonstrated below to choose the files starting a given string.
import os import re parent_path = os.listdir(".") result = [] for file in parent_path: if re.match('001_Jan', file): result.append(file) print(result)
Output:
['001_Jan_Backup_01.txt', '001_Jan_Backup_02.txt', '001_Jan_Backup_03.txt']
Explanation: The re.match()
method is used inside a loop to find all occurrences of files matching with the given string. If you do not use the loop, only the first file matching the given string will be displayed.
Do you want to master the regex superpower? Check out my new book The Smartest Way to Learn Regular Expressions in Python with the innovative 3-step approach for active learning: (1) study a book chapter, (2) solve a code puzzle, and (3) watch an educational chapter video.
Method 3: Using The Glob Module
The glob
module is one of Python’s built-in modules for finding path names. It was inspired by Unix shell and regular expressions. Most of its methods are similar to Unix commands. The main difference between the glob
and re
modules is that while regular expressions use many escapes and quantifiers, the glob module applies only three of them.
*
for listing all matches,?
for optional matching or[]
for multiple character set selection.
Approach: We can use the *
character to choose all files starting with “*001_Jan*
“.
from glob import glob result = glob('*001_Jan*') print(result)
Output:
['001_Jan_Backup_01.txt', '001_Jan_Backup_02.txt', '001_Jan_Backup_03.txt']
Method 4: Simplify The Process With The Pathlib Module
Python 3.6+ presents you with the pathlib
module to simplify file navigations and searches. It comes with auto-slash mapping, enabling you to work across Unix and Windows effortlessly. It also inherits a chunk of Unix shell commands such as touch
, join
, unlink
, and rmdir
.
Approach: You can use Path to locate the directory and then search the files starting with a given string by iterating across the files in the directory.
Example:
# Import the library from pathlib import Path # Tell Python the beginning of the file iteration parent_path = Path('.') # iterate the files, storing the match in the result variable. result = [file.name for file in parent_path.iterdir() if file.name.startswith('001_Jan')] print(result)
Output:
['001_Jan_Backup_01.txt', '001_Jan_Backup_02.txt', '001_Jan_Backup_03.txt']
Conclusion
You can easily choose a file starting with a given string in Python. As illustrated in this tutorial, all you do is choose amongst the os
, re
, glob
, and pathlib
modules. Please subscribe and stay tuned for more interesting articles in the future. Happy learning!