Choose A File Starting With A Given String

Overview

Problem: How to choose a file starting with a given string?

Example: Consider that we have a directory with files as shown below.

How will you select the files starting with “001_Jan“?

Python Modules Cheat Sheet To Choose A File Starting With A Given String

Choosing a file starting with a given string is easy if you know how to use the Python os, re, pathlib, and the glob modules. Assume you want to search/select the files starting with '001_Jan' from a list of files. You can use each module as follows:

➀OS

import os

parent_path = os.listdir("<the folder hosting my-file.txt>")

result = []

for file in parent_path:
  if file.startswith("prefix"):
    result.append(file)

print(result)

➀Re

import os, re

parent_path = os.listdir("<the folder hosting my-file.txt>")

result = []

for file in parent_path:
   if re.match('prefix', file):
       result.append(file)

print(result)

➀Glob

from glob import glob

result = glob('*prefix*')
print(result)

➀Pathlib

from pathlib import Path

parent_path = Path('<the folder hosting my-file.txt>/')

result = [file.name for file in parent_path.iterdir() if file.name.startswith('prefix')]    

Now that you have a quick idea about how to approach the problem let us dive into each solution and find out the mechanism behind each solution.

Method 1: The OS Module

The os module is the most significant module for working with files and folders in Python. It is primarily designed to access folders and files within your operating system.

Approach: To choose a file starting with a given string within a specific directory, you need to locate the directory containing the required files and then use the startswith() method to find out all the files which begin with the given string.

Code:

import os
parent_path = os.listdir(".")

result = []

for file in parent_path:
  if file.startswith("001_Jan"):
    result.append(file)

print(result)

Output: The result is a list containing the files starting with 001_Jan.

['001_Jan_Backup_01.txt', '001_Jan_Backup_02.txt', '001_Jan_Backup_03.txt']

Explanation: We are storing the current working directory in the parent_path variable. We then initialize an empty list, result. Next, we loop through the contents of the parent directory, bookmark the file that starts with ‘001_Jan‘ and append it to the result list. Finally, we print the result using Python’s print() function.

['index.html']

Note: startswith() is a built-in method in Python that returns True when a string starts with a specified value; otherwise it returns False.

Solve Using a List Comprehension

You can implement the above solution in a single line with the help of a list comprehension as shown below.

import os
result = [filename for filename in os.listdir('.') if filename.startswith("001_Jan")]
print(result)

Besides the os module, we can get the same result using the regular expressions, the glob, and pathlib modules, as shown in the following sections.

Method 2: Using Regular Expressions

We can use the re module to work with regular expressions in Python. Regular expressions are crucial in searching and matching text patterns. We can use methods such as re.compile(), re.match with escape characters (. * ^ ? + $ { } [ ] ( ) \ /) and quantifiers to search strings of texts.

Note:

  • The re.match(pattern, string) method returns a match object if the pattern matches at the beginning of the string. The match object contains useful information such as the matching groups and the matching positions. An optional argument flags allows you to customize the regex engine, for example to ignore capitalization. Read more here.
  • The re.findall(pattern, string) method scans string from left to right, searching for all non-overlapping matches of the pattern. It returns a list of strings in the matching order when scanning the string from left to right. Read more here.

Approach: We can use the re.match()method as demonstrated below to choose the files starting a given string.

import os
import re

parent_path = os.listdir(".")
result = []
for file in parent_path:
    if re.match('001_Jan', file):
        result.append(file)
print(result)

Output:

['001_Jan_Backup_01.txt', '001_Jan_Backup_02.txt', '001_Jan_Backup_03.txt']

Explanation: The re.match() method is used inside a loop to find all occurrences of files matching with the given string. If you do not use the loop, only the first file matching the given string will be displayed.

Do you want to master the regex superpower? Check out my new book The Smartest Way to Learn Regular Expressions in Python with the innovative 3-step approach for active learning: (1) study a book chapter, (2) solve a code puzzle, and (3) watch an educational chapter video.

Method 3: Using The Glob Module

The glob module is one of Python’s built-in modules for finding path names. It was inspired by Unix shell and regular expressions. Most of its methods are similar to Unix commands. The main difference between the glob and re modules is that while regular expressions use many escapes and quantifiers, the glob module applies only three of them.

  • * for listing all matches,
  • ? for optional matching or
  • [] for multiple character set selection.

Approach: We can use the * character to choose all files starting with “*001_Jan*“.

from glob import glob

result = glob('*001_Jan*')
print(result)

Output:

['001_Jan_Backup_01.txt', '001_Jan_Backup_02.txt', '001_Jan_Backup_03.txt']

Method 4: Simplify The Process With The Pathlib Module

Python 3.6+ presents you with the pathlib module to simplify file navigations and searches. It comes with auto-slash mapping, enabling you to work across Unix and Windows effortlessly. It also inherits a chunk of Unix shell commands such as touch, join, unlink, and rmdir.

Approach: You can use Path to locate the directory and then search the files starting with a given string by iterating across the files in the directory.

Example:

# Import the library
from pathlib import Path

# Tell Python the beginning of the file iteration
parent_path = Path('.')

# iterate the files, storing the match in the result variable.
result = [file.name for file in parent_path.iterdir() if file.name.startswith('001_Jan')]  

print(result)

Output:

['001_Jan_Backup_01.txt', '001_Jan_Backup_02.txt', '001_Jan_Backup_03.txt']

Conclusion

You can easily choose a file starting with a given string in Python. As illustrated in this tutorial, all you do is choose amongst the os, re, glob, and pathlib modules. Please subscribe and stay tuned for more interesting articles in the future. Happy learning!