5 Best Ways to Traverse a Directory Recursively in Python

πŸ’‘ Problem Formulation: When working with file systems, a common task is to traverse directories recursively to handle files and folders. In Python, several methods exist to achieve this, each with its own use-case. Whether you’re summarizing directory contents, searching for files, or performing batch operations, the goal is to efficiently list all files and directories within a given path. For example, given a directory /my_folder, we want to recursively list all files and subdirectories contained within it.

Method 1: Using os.walk()

The os.walk() function is a versatile tool for traversing directory trees. It generates the file names in a directory tree by walking either top-down or bottom-up. For each directory within the tree, it yields a 3-tuple containing the directory path, directory names, and file names.

Here’s an example:

import os

for root, dirs, files in os.walk('/my_folder'):
    print(f'Current Path: {root}')
    print(f'Subdirectories: {dirs}')
    print(f'Files: {files}')
    print('--------------')

Output:

Current Path: /my_folder
Subdirectories: ['subfolder1', 'subfolder2']
Files: ['file1.txt', 'file2.txt']
--------------
Current Path: /my_folder/subfolder1
Subdirectories: []
Files: ['file3.txt']
--------------
... and so on for each subdirectory ...

This code uses os.walk() to iterate over the directory ‘/my_folder’. In each iteration, it prints the current directory path, the subdirectories within it, and the files contained. This method is comprehensive but may be overkill if one simply needs a list of files.

Method 2: Using Pathlib

The modern approach in Python 3.4+ uses the pathlib module, which provides an object-oriented interface for file system paths. Specifically, the Path.rglob() method is useful for recursive directory traversal, matching all files and directories with a specific pattern.

Here’s an example:

from pathlib import Path

for path in Path('/my_folder').rglob('*'):
    print(path.name)

Output:

file1.txt
file2.txt
file3.txt
... and so on for each file and subdirectory ...

The above code snippet leverages Path.rglob() to traverse all files and directories within ‘/my_folder’. rglob('*') matches everything, making it a simple and elegant way to list all items recursively.

Method 3: Using glob.glob() with the recursive flag

The glob module’s glob() function supports a recursive parameter as of Python 3.5+. This allows pattern matching for files in a directory recursively using the ‘**’ wildcard.

Here’s an example:

import glob

for filepath in glob.glob('/my_folder/**/*', recursive=True):
    print(filepath)

Output:

/my_folder/file1.txt
/my_folder/file2.txt
/my_folder/subfolder1/file3.txt
... and so on through the file system ...

The code uses the glob.glob() function with recursive=True to find all paths in ‘/my_folder’ and prints each one. This is convenient for filename pattern matching.

Method 4: Using os.listdir() in a Recursive Function

For a custom recursive directory traversal, combine os.listdir() with a recursive function. This can be useful for more complex traversal scenarios beyond just listing files and directories.

Here’s an example:

import os

def list_dir_recursive(directory):
    for entry in os.listdir(directory):
        full_path = os.path.join(directory, entry)
        if os.path.isdir(full_path):
            list_dir_recursive(full_path)
        else:
            print(full_path)

list_dir_recursive('/my_folder')

Output:

/my_folder/file1.txt
/my_folder/file2.txt
/my_folder/subfolder1/file3.txt
... and so on for each file ...

The function list_dir_recursive() is defined to print each file path within ‘/my_folder’, calling itself when encountering a directory. This is a more manual but highly customizable method.

Bonus One-Liner Method 5: Using os.scandir() and a Generator Expression

With Python 3.5+, os.scandir() can be used for more efficient directory iteration. Combined with a generator expression, one can create a compact recursive directory walker.

Here’s an example:

import os

def scandir_recursive(directory):
    for entry in os.scandir(directory):
        if entry.is_dir(follow_symlinks=False):
            yield from scandir_recursive(entry.path)
        else:
            yield entry.path

print(list(scandir_recursive('/my_folder')))

Output:

['/my_folder/file1.txt', '/my_folder/file2.txt', '/my_folder/subfolder1/file3.txt', ...]

The generator function scandir_recursive() is a Pythonic and efficient way to list all files in ‘/my_folder’, yielding file paths for each file encountered, recursively traversing subdirectories.

Summary/Discussion

  • Method 1: os.walk(). It’s the go-to method for many due to its simplicity. However, it can be slower than other methods, especially for large directory trees.
  • Method 2: Pathlib. Offers an elegant and OOP-based approach to file system paths. It’s a modern and readable technique but may not perform as well as os.scandir() for very large directory structures.
  • Method 3: glob.glob(). Ideal for pattern matching and concise code, but lacks the finer control and performance of more direct filesystem traversal methods.
  • Method 4: os.listdir() in a Recursive Function. It allows for tailor-made traversal logic, offering flexibility at the cost of more boilerplate code. A good choice for specific use-cases where control is crucial.
  • Bonus One-Liner Method 5: os.scandir(). Provides a minimalist yet efficient solution, useful when performance is a concern, though it lacks the simplicity of os.walk() at first glance.