Python os.walk() – A Simple Illustrated Guide

According to the Python version 3.10.3 official doc, the os module provides built-in miscellaneous operating system interfaces. We can achieve many operating system dependent functionalities through it. One of the functionalities is to generate the file names in a directory tree through os.walk().

If it sounds great to you, please continue reading, and you will fully understand os.walk through Python code snippets and vivid visualization.

In this article, I will first introduce the usage of os.walk and then address three top questions about os.walk, including passing a file’s filepath to os.walk, os.walk vs. os.listdir, and os.walk recursive.

How to Use os.walk and the topdown Parameter?

Syntax

Here is the syntax for os.walk:

os.walk(top[, topdown=True[, onerror=None[, followlinks=False]]])

Input

1. Must-have parameters:

  • top: accepts a directory(or file) path string that you want to use as the root to generate filenames.

2. Optional parameters:

  • topdown: accepts a boolean value, default=True. If True or not specified, directories are scanned from top-down. Otherwise, directories are scanned from the bottom-up. If you are still confused about this topdown parameter like I first get to know os.walk, I have a nicely visualization in the example below.
  • onerror: accepts a function with one argument, default=None. It can report the error to continue with the walk, or raise the exception to abort the walk.
  • followlinks: accepts a boolean value, default=False. If True, we visit directories pointed to by symlinks, on systems that support them.

💡 Tip: Generally, you only need to use the first two parameters in bold format.

Output

Yields 3-tuples (dirpath, dirnames, filenames) for each directory in the tree rooted at directory top (including top itself).

Example

I think the best way to comprehend os.walk is walking through an example.

Our example directory tree and its labels are:

https://upwork-usw2-prod-file-storage-wp2.s3.us-west-2.amazonaws.com/workplace/attachment/acd466a060fd5ec81422653b3538d701?response-content-disposition=inline%3B%20filename%3D%22Pasted%2520File%2520at%2520March%252021%252C%25202022%25203%253A47%2520PM.png%22%3B%20filename%2A%3Dutf-8%27%27Pasted%2520File%2520at%2520March%252021%252C%25202022%25203%253A47%2520PM.png&X-Amz-Security-Token=IQoJb3JpZ2luX2VjEIv%2F%2F%2F%2F%2F%2F%2F%2F%2F%2FwEaCXVzLXdlc3QtMiJHMEUCIQCyiBCnRbIwIVJwG5vjzyopXWzHfJUsf5%2F%2BK9fd2PKazgIgbZ3ZXTfCCWWsXL%2BmrYz1Qn5CXA7M%2FhraTBhNHjPiUU8qzQQIFBAAGgw3Mzk5MzkxNzM4MTkiDGj1eRRYtEvJRaVyBiqqBKNPpUFRjSSaCWiCVpWWLzefc61NQGtLo053vO1oV0ejXsabF9XFGtI9U8TJyz9%2FCsd%2B6RMM452UlK7UzUucvgK7QTx5rCCy7VMex66am%2BqVc6QYYVZSMGPseOyxS5oXqQfS000Dcp4JOd3iK0QXoLqI1hpxyq0F%2FFMqnqyfu0CAbhx2meVt2z2QXdn2JhCkJSo7W3LmvWycR%2Ft7RxqzJIg60WL%2F%2B0zJubfd2DqpE2%2FugFC54vqBajxgDRGEcyV45SJDQYnQeTtIhbF4C09eC4bCPu8YjC%2FzNZxobr7VROKFRkwVgskPlJ1bv1U%2F3UaoXXlBTdBYxqx2qh0seYqNMMbRoR8v5a0MzDjbP1viWfL2M9u9pSpOj5t6TgKWjk8rWvPfEi3uUkyKqxawo%2B8CftcG80qvzeHIeUhzfoDINI3YkIOLwpMpWIazEeO7JjLtEJ2kDTC8uAF4EkMDBr86OZN9iFAFrkXzo528rWMUZxK3G%2BOujXthd10zdDNwGIImc8Yt3i7bRgLP1qMhl7teBn1BosYByOUhT1k5Z%2B%2FksZcVUqQtS9FWU9a2O%2FMVzEsgKuGGmm1BNBvYqlI8SISh3ae6MQlA4w2fQiLHEbZ4i8W4Cq9RwfqOaH%2FOnEUSUjSwN6PrKoAyNNOIthG4aUZW70a6AF%2FFzOhw10H78t%2BwGLkYtHH74QI6lb%2BusdZ9dtgejXqHQ614QLScXWfmWeioKfj8BngWaQCGgb8XMIi14ZEGOqcBOjkjtl%2FfVzkK0wk%2FLn5h5wbbj%2F28kXOORQnHHUEXbVgx48AeDVeAHIC68kxCQjLawC0QihWfYAaVK%2BvuSvYipgpD4JO1lPphlPskoKgZgqLlxl64T8jwlFOeSwJWm2VmFHBLQC3H7f55mM7PLn28Shr861KEwD%2BaYuEmtXfFNVAMfGV3%2FvJ2fDgdKy%2FsSY1wz%2BxuMkqIL6aJ9i1ZgM0PVowojIeAUwA%3D&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Date=20220321T120046Z&X-Amz-SignedHeaders=host&X-Amz-Expires=599&X-Amz-Credential=ASIA2YR6PYW56RLSC66O%2F20220321%2Fus-west-2%2Fs3%2Faws4_request&X-Amz-Signature=a9a4929be3ad1851d8555d448dd05819f1c39a45c2d0bb31f0a9bfc2bb9f78f5

By the way, the difference between a directory and a file is that a directory can contains many files like the above directory D contains 4.txt and 5.txt.

Back to our example, our goal is to 

  • Generate filenames based on the root directory, learn_os_walk
  • Understand the difference between topdown=True and topdown=False

To use the os.walk() method, we need to first import os module:

import os

Then we can pass the input parameters to the os.walk and generate filenames. The code snippet is:

a_directory_path = './learn_os_walk'


def take_a_walk(fp, topdown_flag=True):
    print(f'\ntopdown_flag:{topdown_flag}\n')
    for pathname, subdirnames, subfilenames in os.walk(fp, topdown=topdown_flag):
        print(pathname)
        print(subdirnames)
        print(subfilenames)
        print('--------------------------------')
    print('What a walk!')


# *Try to walk in a directory path
take_a_walk(a_directory_path)
# Output more than Just 'What a walk!'
# Also all the subdirnames and subfilenames in each file tree level.
# BTW if you want to look through all files in a directory, you can add
# another for subfilename in subfilenames loop inside.

The above code has a function take_a_walk to use os.walk along with a for loop. This is the most often usage of os.walk so that you can get every file level and filenames from the root directory iteratively.

For those with advanced knowledge in Python’s generator, you would probably have already figured out that os.walk actually gives you a generator to yield next and next and next 3-tuple……

Back in this code, we set a True flag for the topdown argument. Visually, the topdown search way is like the orange arrow in the picture below:

https://upwork-usw2-prod-file-storage-wp1.s3.us-west-2.amazonaws.com/workplace/attachment/b676c20656014c342b6fdcff1d6b8fb3?response-content-disposition=inline%3B%20filename%3D%22Pasted%2520File%2520at%2520March%252021%252C%25202022%25203%253A47%2520PM.png%22%3B%20filename%2A%3Dutf-8%27%27Pasted%2520File%2520at%2520March%252021%252C%25202022%25203%253A47%2520PM.png&X-Amz-Security-Token=IQoJb3JpZ2luX2VjEIz%2F%2F%2F%2F%2F%2F%2F%2F%2F%2FwEaCXVzLXdlc3QtMiJIMEYCIQCoZwKhVw19Zr8TYq7h%2FV8ghNFacg7Q69xjpcp6q6RygwIhALSJMisRP0uDPyiYLcdv1d7%2BZupGv8oNeOw1rsbx4HK6Ks0ECBUQABoMNzM5OTM5MTczODE5Igy3tNpyZE5ggjIuji4qqgSG1QuiBhNX2KJ%2Bx7zk%2BuT634j0%2FQAHVETIhHtO4ppfJG%2BeTKWxRImJiWDSkY0kxW46Qw9IKblQMKuSHVNd%2Bh5q%2BBp5%2BzAvzTF%2BHBY2aovgOr6uBfcXI9vdP0NKgTUYElJJ8VArN1Uc7PB5XVu7LQshl7yJEjtqyWJDqvxC%2Bl6e1qkxupmEIROtEm6SnknrVWL2Kg7h9BsgCmq1JInwPratvQIsFEKDAA3lPMW6u1IcdqTo7R6GmKiy0%2FtvQSVuR4lw7iCNe6f042TAr1dvgzU1udjwEVsIF29%2FqvoIzi38lO1L1FcQGvX%2B7mVZn0lzqwSDEPlhFaJkxD%2Fe6vdPMnVM02InKth1FpnN%2BJ%2F1HwrzRTZg1JsHa6XRNURS715g%2B7%2FDaU5wkaN0832woUZefdRzU6FG4Rg2CYqEMTJyRGXNRzTN66M62dXKqkXZ2N0pZ6Tgbdo3Zx60DukIon826TMBBTSDkvzmKEh7jjEqtegc%2BgAmdno%2F0lvFAkcMnUPP3aLPjXECpqC0UlrPTuYuWaj0%2BR%2F8xry7Ex0cyE%2BOY%2BB9cA1Eb98X8x9KvL0ptvKL8QmkixUX236hCKXVdoSShscmZd73BFA6jDd7xHm7nI33OkyYMFurKp5wksU3NA0vxNw5trvwbeHKCYCoVVytxlYLSF5%2BxAmJfhFHoMLtyV%2B%2BQzvvtTXL38bmqVqr1lFUc8urf3ANB%2F%2FTsk9TPmDx1bCDv7aJ5J9OkultbzDpweGRBjqmAdfzdGTiBQbNuFM%2FZkaWhxx%2BmhItGluYB%2B1J71HgOzEqUVATyFTRWui7vxPhd3vdVmglG9YNgM1KG0P2IFttfDz%2FjkVqIl9P9OV9DDebqTlY7D6bNFz2hUm%2Fpfj6WMYybSJg1pWUUzm3NvmfmEDXRMYnWuOuYnvxOpdPz6HGNaKx7Z8LNbEDQ9ehCFtSUYPa7zxo5JqpKXqnlAShb6f1UjALNqn51%2B0%3D&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Date=20220321T120047Z&X-Amz-SignedHeaders=host&X-Amz-Expires=599&X-Amz-Credential=ASIA2YR6PYW542YOY5WM%2F20220321%2Fus-west-2%2Fs3%2Faws4_request&X-Amz-Signature=b3c29b5a8d9d4bc1f053fe367fc8bb5fbf24ea79efdc5ab8a08179822a612758

And if we run the above code, we can the below result:

If we set the topdown to be False, we are walking the directory tree from its bottom directory D like this:

https://upwork-usw2-prod-file-storage-wp1.s3.us-west-2.amazonaws.com/workplace/attachment/18a45859ff9fb1d00218392c498f969d?response-content-disposition=inline%3B%20filename%3D%22Pasted%2520File%2520at%2520March%252021%252C%25202022%25203%253A48%2520PM.png%22%3B%20filename%2A%3Dutf-8%27%27Pasted%2520File%2520at%2520March%252021%252C%25202022%25203%253A48%2520PM.png&X-Amz-Security-Token=IQoJb3JpZ2luX2VjEIv%2F%2F%2F%2F%2F%2F%2F%2F%2F%2FwEaCXVzLXdlc3QtMiJHMEUCIQCyiBCnRbIwIVJwG5vjzyopXWzHfJUsf5%2F%2BK9fd2PKazgIgbZ3ZXTfCCWWsXL%2BmrYz1Qn5CXA7M%2FhraTBhNHjPiUU8qzQQIFBAAGgw3Mzk5MzkxNzM4MTkiDGj1eRRYtEvJRaVyBiqqBKNPpUFRjSSaCWiCVpWWLzefc61NQGtLo053vO1oV0ejXsabF9XFGtI9U8TJyz9%2FCsd%2B6RMM452UlK7UzUucvgK7QTx5rCCy7VMex66am%2BqVc6QYYVZSMGPseOyxS5oXqQfS000Dcp4JOd3iK0QXoLqI1hpxyq0F%2FFMqnqyfu0CAbhx2meVt2z2QXdn2JhCkJSo7W3LmvWycR%2Ft7RxqzJIg60WL%2F%2B0zJubfd2DqpE2%2FugFC54vqBajxgDRGEcyV45SJDQYnQeTtIhbF4C09eC4bCPu8YjC%2FzNZxobr7VROKFRkwVgskPlJ1bv1U%2F3UaoXXlBTdBYxqx2qh0seYqNMMbRoR8v5a0MzDjbP1viWfL2M9u9pSpOj5t6TgKWjk8rWvPfEi3uUkyKqxawo%2B8CftcG80qvzeHIeUhzfoDINI3YkIOLwpMpWIazEeO7JjLtEJ2kDTC8uAF4EkMDBr86OZN9iFAFrkXzo528rWMUZxK3G%2BOujXthd10zdDNwGIImc8Yt3i7bRgLP1qMhl7teBn1BosYByOUhT1k5Z%2B%2FksZcVUqQtS9FWU9a2O%2FMVzEsgKuGGmm1BNBvYqlI8SISh3ae6MQlA4w2fQiLHEbZ4i8W4Cq9RwfqOaH%2FOnEUSUjSwN6PrKoAyNNOIthG4aUZW70a6AF%2FFzOhw10H78t%2BwGLkYtHH74QI6lb%2BusdZ9dtgejXqHQ614QLScXWfmWeioKfj8BngWaQCGgb8XMIi14ZEGOqcBOjkjtl%2FfVzkK0wk%2FLn5h5wbbj%2F28kXOORQnHHUEXbVgx48AeDVeAHIC68kxCQjLawC0QihWfYAaVK%2BvuSvYipgpD4JO1lPphlPskoKgZgqLlxl64T8jwlFOeSwJWm2VmFHBLQC3H7f55mM7PLn28Shr861KEwD%2BaYuEmtXfFNVAMfGV3%2FvJ2fDgdKy%2FsSY1wz%2BxuMkqIL6aJ9i1ZgM0PVowojIeAUwA%3D&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Date=20220321T120048Z&X-Amz-SignedHeaders=host&X-Amz-Expires=599&X-Amz-Credential=ASIA2YR6PYW56RLSC66O%2F20220321%2Fus-west-2%2Fs3%2Faws4_request&X-Amz-Signature=09242016816fae2e67afa4b60a2f19c5b251ac2b4782f3d8d5e45526dafd862e

The corresponding code snippet is:

a_directory_path = './learn_os_walk'


def take_a_walk(fp, topdown_flag=False):
    print(f'\ntopdown_flag:{topdown_flag}\n')
    for pathname, subdirnames, subfilenames in os.walk(fp, topdown=topdown_flag):
        print(pathname)
        print(subdirnames)
        print(subfilenames)
        print('--------------------------------')
    print('What a walk!')


# *Try to walk in a directory path
take_a_walk(a_directory_path)
# Output more than Just 'What a walk!'
# Also all the subdirnames and subfilenames in each file tree level.
# BTW if you want to look through all files in a directory, you can add
# another for subfilename in subfilenames loop inside.

And if we run the above code, we can the below result:

Now, I hope you understand how to use os.walk and the difference between topdown=True and topdown=False. 🙂

Here’s the full code for this example:

__author__ = 'Anqi Wu'

import os

a_directory_path = './learn_os_walk'
a_file_path = './learn_os_walk.py'  # same as a_file_path = __file__


def take_a_walk(fp, topdown_flag=True):
    print(f'\ntopdown_flag:{topdown_flag}\n')
    for pathname, subdirnames, subfilenames in os.walk(fp, topdown=topdown_flag):
        print(pathname)
        print(subdirnames)
        print(subfilenames)
        print('--------------------------------')
    print('What a walk!')


# *Try to walk in a file path
take_a_walk(a_file_path)
# Output Just 'What a walk!'
# Because there are neither subdirnames nor subfilenames in a single file !
# It is like:
# for i in []:
#     print('hi!')  # We are not going to execute this line.


# *Try to walk in a directory path
take_a_walk(a_directory_path)
# Output more than Just 'What a walk!'
# Also all the subdirnames and subfilenames in each file tree level.
# BTW if you want to look through all files in a directory, you can add
# another for subfilename in subfilenames loop inside.

# *Try to list all files and directories in a directory path
print('\n')
print(os.listdir(a_directory_path))
print('\n')

Can You Pass a File’s Filepath to os.walk?

Of course, you might wonder what will happen if we pass a file’s filepath, maybe a Python module filepath string like './learn_os_walk.py' to the os.walk function.

This is exactly a point I was thinking when I started using this method. The simple answer is that it will not execute your codes under the for loop.

For example, if you run a code in our learn_os_walk.py like this:

import os

a_file_path = './learn_os_walk.py'  # same as a_file_path = __file__

def take_a_walk(fp, topdown_flag=False):
    print(f'\ntopdown_flag:{topdown_flag}\n')
    for pathname, subdirnames, subfilenames in os.walk(fp, topdown=topdown_flag):
        print(pathname)
        print(subdirnames)
        print(subfilenames)
        print('--------------------------------')
    print('What a walk!')

# *Try to walk in a file path
take_a_walk(a_file_path)

The only output would be like this:

Why is that?

Because there are neither subdirnames nor subfilenames in a single file! It is like you are writing the below code:

for i in []:
    print('hi!')

And you will not get any 'hi' output because there is no element in an empty list.

Now, I hope you understand why the official doc tells us to pass a path to a directory instead of a file’s filepath 🙂

os.walk vs os.listdir — When to Use Each?

A top question of programmers concerns the difference between os.walk vs os.listdir.

The simple answer is:

The os.listdir() method returns a list of every file and folder in a directory. The os.walk() method returns a list of every file in an entire file tree.

Well, if you feel a little bit uncertain, we can then use code examples to help us understand better!

We will stick to our same example directory tree as below:

In this case, if we call os.listdir() method and pass the directory path of learn_os_walk to it like the code below:

import os

a_directory_path = './learn_os_walk'

# *Try to list all files and directories in a directory path
print('\n')
print(os.listdir(a_directory_path))
print('\n')

And we will get an output like:

That’s it! Only the first layer of this entire directory tree is included. Or I should say that the os.listdir() cares only about what is directly in the root directory instead of searching through the entire directory tree like we see before in the os.walk example.

Summary

Summary: If you want to get a list of all filenames and directory names within a root directory, go with the os.listdir() method. If you want to iterate over an entire directory tree, you should consider os.walk() method.

Now, I hope you understand when to use os.listdir and when to use os.walk 🙂

os.walk() Recursive — How to traverse a Directory Tree?

Our last question with os.walk is about how to literally iterate over the entire directory tree.

Concretely, we have some small goals for our next example:

  • Iterate over all files within a directory tree
  • Iterate over all directories within a directory tree

All examples below are still based on our old friend, the example directory tree:

Iterate over all files within a directory tree

First, let’s head over iterating over all files within a directory tree. This can be achieved by a nested for loop in Python.

The potential application could be some sanity checks or number counts for all files within one folder. How about counting the number of .txt files within one folder? Let’s do it!

The code for this application is:

import os

a_directory_path = './learn_os_walk'
total_file = 0

for pathname, subdirnames, subfilenames in os.walk(a_directory_path):
    for subfilename in subfilenames:
        if subfilename.endswith('.txt'):
            total_file += 1
print(f'\n{total_file}\n')

As you can see, we use another for loop to iterate over subfilenames to get evey file within a directory tree. The output is 7 and is correct according to our example directory tree.

The full code for this example can be found here.

Iterate over all directories within a directory tree

Last, we can also iterate over all directories within a directory tree. This can be achieved by a nested for loop in Python.

The potential application could be also be some sanity checks or number counts for all directories within one folder. For our example, let’s check if all directories contains __init__.py file and add an empty __init__.py file if not.

💡 Idea: The __init__.py file signifies whether the entire directory is a Python package or not.

The code for this application is:

import os

a_directory_path = './learn_os_walk'

for pathname, subdirnames, subfilenames in os.walk(a_directory_path):
    for subdirname in subdirnames:
        init_filepath = os.path.join(pathname, subdirname, '__init__.py')
        if not os.path.exists(init_filepath):
            print(f'Create a new empty [{init_filepath}] file.')
            with open(init_filepath, 'w') as f:
                pass

As you can see, we use another for loop to iterate over subdirnames to get evey directory within a directory tree.

Before the execution, our directory tree under the take_a_walk function mentioned before looks like this:

After the execution, we can take a walk along the directory tree again and we get result like:

Hooray! We successfully iterate every directory within a directory tree and complete the __init__.py sanity check.

The full code for this example can be found here.

In summary, you can use os.walk recursively traverse every file or directory within a directory tree through a nested for loop.

Conclusion

That’s it for our os.walk() article!

We learned about its syntax, IO relationship, and difference between os.walk and os.listdir.

We also worked on real usage examples, ranging from changing the search direction through topdown parameter, .txt file number count, and __init__.py sanity check.

Hope you enjoy all this and happy coding!


About the Author

Anqi Wu is an aspiring Data Scientist and self-employed Technical Consultant. She is an incoming student for a Master’s program in Data Science and builds her technical consultant profile on Upwork.

Anqi is passionate about machine learning, statistics, data mining, programming, and many other data science related fields. During her undergraduate years, she has proven her expertise, including multiple winning and top placements in mathematical modeling contests. She loves supporting and enabling data-driven decision-making, developing data services, and teaching. 

Here is a link to the author’s personal website: https://www.anqiwu.one/. She uploads data science blogs weekly there to document her data science learning and practicing for the past week, along with some best learning resources and inspirational thoughts.

Hope you enjoy this article! Cheers!