According to the Python version 3.10.3 official doc, the os
module provides built-in miscellaneous operating system interfaces. We can achieve many operating system dependent functionalities through it. One of the functionalities is to generate the file names in a directory tree through os.walk()
.
If it sounds great to you, please continue reading, and you will fully understand os.walk through Python code snippets and vivid visualization.
In this article, I will first introduce the usage of os.walk
and then address three top questions about os.walk
, including passing a file’s filepath to os.walk
, os.walk
vs. os.listdir
, and os.walk
recursive.
How to Use os.walk and the topdown Parameter?
Syntax
Here is the syntax for os.walk
:
os.walk(top[, topdown=True[, onerror=None[, followlinks=False]]])
Input
1. Must-have parameters:
top
: accepts a directory(or file) path string that you want to use as the root to generate filenames.
2. Optional parameters:
topdown
: accepts a boolean value,default=True
. IfTrue
or not specified, directories are scanned from top-down. Otherwise, directories are scanned from the bottom-up. If you are still confused about thistopdown
parameter like I first get to knowos.walk
, I have a nicely visualization in the example below.onerror
: accepts a function with one argument,default=None
. It can report the error to continue with the walk, or raise the exception to abort the walk.followlinks
: accepts a boolean value,default=False
. IfTrue
, we visit directories pointed to by symlinks, on systems that support them.
💡 Tip: Generally, you only need to use the first two parameters in bold format.
Output
Yields 3-tuples (dirpath, dirnames, filenames) for each directory in the tree rooted at directory top (including top itself).
Example
I think the best way to comprehend os.walk
is walking through an example.
Our example directory tree and its labels are:
By the way, the difference between a directory and a file is that a directory can contains many files like the above directory D contains 4.txt
and 5.txt
.
Back to our example, our goal is to
- Generate filenames based on the root directory,
learn_os_walk
- Understand the difference between
topdown=True
andtopdown=False
To use the os.walk()
method, we need to first import os
module:
import os
Then we can pass the input parameters to the os.walk
and generate filenames. The code snippet is:
a_directory_path = './learn_os_walk' def take_a_walk(fp, topdown_flag=True): print(f'\ntopdown_flag:{topdown_flag}\n') for pathname, subdirnames, subfilenames in os.walk(fp, topdown=topdown_flag): print(pathname) print(subdirnames) print(subfilenames) print('--------------------------------') print('What a walk!') # *Try to walk in a directory path take_a_walk(a_directory_path) # Output more than Just 'What a walk!' # Also all the subdirnames and subfilenames in each file tree level. # BTW if you want to look through all files in a directory, you can add # another for subfilename in subfilenames loop inside.
The above code has a function take_a_walk
to use os.walk
along with a for loop. This is the most often usage of os.walk
so that you can get every file level and filenames from the root directory iteratively.
For those with advanced knowledge in Python’s generator, you would probably have already figured out that os.walk
actually gives you a generator to yield next and next and next 3-tuple……
Back in this code, we set a True
flag for the topdown
argument. Visually, the topdown search way is like the orange arrow in the picture below:
And if we run the above code, we can the below result:
If we set the topdown to be False
, we are walking the directory tree from its bottom directory D like this:
The corresponding code snippet is:
a_directory_path = './learn_os_walk' def take_a_walk(fp, topdown_flag=False): print(f'\ntopdown_flag:{topdown_flag}\n') for pathname, subdirnames, subfilenames in os.walk(fp, topdown=topdown_flag): print(pathname) print(subdirnames) print(subfilenames) print('--------------------------------') print('What a walk!') # *Try to walk in a directory path take_a_walk(a_directory_path) # Output more than Just 'What a walk!' # Also all the subdirnames and subfilenames in each file tree level. # BTW if you want to look through all files in a directory, you can add # another for subfilename in subfilenames loop inside.
And if we run the above code, we can the below result:
Now, I hope you understand how to use os.walk
and the difference between topdown=True
and topdown=False
. 🙂
Here’s the full code for this example:
__author__ = 'Anqi Wu' import os a_directory_path = './learn_os_walk' a_file_path = './learn_os_walk.py' # same as a_file_path = __file__ def take_a_walk(fp, topdown_flag=True): print(f'\ntopdown_flag:{topdown_flag}\n') for pathname, subdirnames, subfilenames in os.walk(fp, topdown=topdown_flag): print(pathname) print(subdirnames) print(subfilenames) print('--------------------------------') print('What a walk!') # *Try to walk in a file path take_a_walk(a_file_path) # Output Just 'What a walk!' # Because there are neither subdirnames nor subfilenames in a single file ! # It is like: # for i in []: # print('hi!') # We are not going to execute this line. # *Try to walk in a directory path take_a_walk(a_directory_path) # Output more than Just 'What a walk!' # Also all the subdirnames and subfilenames in each file tree level. # BTW if you want to look through all files in a directory, you can add # another for subfilename in subfilenames loop inside. # *Try to list all files and directories in a directory path print('\n') print(os.listdir(a_directory_path)) print('\n')
Can You Pass a File’s Filepath to os.walk?
Of course, you might wonder what will happen if we pass a file’s filepath, maybe a Python module filepath string like './learn_os_walk.py'
to the os.walk
function.
This is exactly a point I was thinking when I started using this method. The simple answer is that it will not execute your codes under the for loop.
For example, if you run a code in our learn_os_walk.py
like this:
import os a_file_path = './learn_os_walk.py' # same as a_file_path = __file__ def take_a_walk(fp, topdown_flag=False): print(f'\ntopdown_flag:{topdown_flag}\n') for pathname, subdirnames, subfilenames in os.walk(fp, topdown=topdown_flag): print(pathname) print(subdirnames) print(subfilenames) print('--------------------------------') print('What a walk!') # *Try to walk in a file path take_a_walk(a_file_path)
The only output would be like this:
Why is that?
Because there are neither subdirnames nor subfilenames in a single file! It is like you are writing the below code:
for i in []: print('hi!')
And you will not get any 'hi'
output because there is no element in an empty list.
Now, I hope you understand why the official doc tells us to pass a path to a directory instead of a file’s filepath 🙂
os.walk vs os.listdir — When to Use Each?
A top question of programmers concerns the difference between os.walk
vs os.listdir
.
The simple answer is:
The os.listdir()
method returns a list of every file and folder in a directory. The os.walk()
method returns a list of every file in an entire file tree.
Well, if you feel a little bit uncertain, we can then use code examples to help us understand better!
We will stick to our same example directory tree as below:
In this case, if we call os.listdir()
method and pass the directory path of learn_os_walk
to it like the code below:
import os a_directory_path = './learn_os_walk' # *Try to list all files and directories in a directory path print('\n') print(os.listdir(a_directory_path)) print('\n')
And we will get an output like:
That’s it! Only the first layer of this entire directory tree is included. Or I should say that the os.listdir()
cares only about what is directly in the root directory instead of searching through the entire directory tree like we see before in the os.walk
example.
Summary
Summary: If you want to get a list of all filenames and directory names within a root directory, go with the os.listdir()
method. If you want to iterate over an entire directory tree, you should consider os.walk()
method.
Now, I hope you understand when to use os.listdir
and when to use os.walk
🙂
os.walk() Recursive — How to traverse a Directory Tree?
Our last question with os.walk
is about how to literally iterate over the entire directory tree.
Concretely, we have some small goals for our next example:
- Iterate over all files within a directory tree
- Iterate over all directories within a directory tree
All examples below are still based on our old friend, the example directory tree:
Iterate over all files within a directory tree
First, let’s head over iterating over all files within a directory tree. This can be achieved by a nested for
loop in Python.
The potential application could be some sanity checks or number counts for all files within one folder. How about counting the number of .txt
files within one folder? Let’s do it!
The code for this application is:
import os a_directory_path = './learn_os_walk' total_file = 0 for pathname, subdirnames, subfilenames in os.walk(a_directory_path): for subfilename in subfilenames: if subfilename.endswith('.txt'): total_file += 1 print(f'\n{total_file}\n')
As you can see, we use another for
loop to iterate over subfilenames to get evey file within a directory tree. The output is 7
and is correct according to our example directory tree.
The full code for this example can be found here.
Iterate over all directories within a directory tree
Last, we can also iterate over all directories within a directory tree. This can be achieved by a nested for loop in Python.
The potential application could be also be some sanity checks or number counts for all directories within one folder. For our example, let’s check if all directories contains __init__.py
file and add an empty __init__.py
file if not.
💡 Idea: The __init__.py
file signifies whether the entire directory is a Python package or not.
The code for this application is:
import os a_directory_path = './learn_os_walk' for pathname, subdirnames, subfilenames in os.walk(a_directory_path): for subdirname in subdirnames: init_filepath = os.path.join(pathname, subdirname, '__init__.py') if not os.path.exists(init_filepath): print(f'Create a new empty [{init_filepath}] file.') with open(init_filepath, 'w') as f: pass
As you can see, we use another for
loop to iterate over subdirnames
to get evey directory within a directory tree.
Before the execution, our directory tree under the take_a_walk
function mentioned before looks like this:
After the execution, we can take a walk along the directory tree again and we get result like:
Hooray! We successfully iterate every directory within a directory tree and complete the __init__.py
sanity check.
The full code for this example can be found here.
In summary, you can use os.walk
recursively traverse every file or directory within a directory tree through a nested for loop.
Conclusion
That’s it for our os.walk()
article!
We learned about its syntax, IO relationship, and difference between os.walk
and os.listdir
.
We also worked on real usage examples, ranging from changing the search direction through topdown parameter, .txt
file number count, and __init__.py
sanity check.
Hope you enjoy all this and happy coding!
About the Author
Anqi Wu is an aspiring Data Scientist and self-employed Technical Consultant. She is an incoming student for a Master’s program in Data Science and builds her technical consultant profile on Upwork.
Anqi is passionate about machine learning, statistics, data mining, programming, and many other data science related fields. During her undergraduate years, she has proven her expertise, including multiple winning and top placements in mathematical modeling contests. She loves supporting and enabling data-driven decision-making, developing data services, and teaching.
Here is a link to the author’s personal website: https://www.anqiwu.one/. She uploads data science blogs weekly there to document her data science learning and practicing for the past week, along with some best learning resources and inspirational thoughts.
Hope you enjoy this article! Cheers!