Функция os.walk
Функция walk модуля os принимает один обязательный аргумент и несколько необязательных. В качестве обязательного аргумента должен быть передан адрес каталога.
Функция walk() возвращает объект-генератор, из которого получают кортежи. Каждый кортеж «описывает» очередной каталог из переданного в функцию дерева каталогов.
Каждый кортеж состоит из трех элементов:
- Адрес очередного каталога в виде строки.
- Список имен подкаталогов первого уровня вложенности в данный каталог. Если вложенных каталогов нет, список будет пустым.
- Список имен файлов первого уровня вложенности в данный каталог. Если вложенных файлов нет, список будет пустым.
Допустим, есть такое дерево каталогов:
Передадим каталог test функции os.walk() :
Если передать абсолютный адрес, адреса каталогов также будут абсолютными:
Поскольку walk() возвращает генератор, повторно извлечь из него данные нельзя. Поэтому, если возникает необходимость сохранить кортежи, генератор можно «превратить» в список кортежей:
Чтобы получить полный адрес файла (абсолютный или относительный), следует воспользоваться функцией os.path.join:
Переменная address на каждой итерации связывается с первым элементом очередного кортежа (строкой, содержащей адрес каталога), dirs – со вторым элементом (списком подкаталогов), а files — со списком файлов этого каталога. Во вложенном цикле извлекается имя каждого файла из списка файлов.
У функции walk есть аргумент topdown , который по умолчанию имеет значение True. Если ему присвоить False, то обход дерева каталогов будет происходить не «сверху вниз» (от корневого к вложенным), а наоборот — «снизу вверх» (первыми будут подкаталоги).
Manipulating Python’s os.walk()
Python’s os.walk() is a method that walks a directory tree, yielding lists of directory names and file names. If you’re not close friends though, it can appear tricky to control. It may flood your screen with hidden files or generally have poor boundaries! This is my effort for you guys to get to know each other a bit better.
How do I use os.walk()?
A standard way of walking down a path with os.walk() is making a loop:
Try it out in your python REPL in order to get the gist of it. It will print out the root directory path for each loop, a list of the dirs in it and a list of the files in it.
Another way of doing it would be by using the os.path.join() method, which will print the full paths of the directories and files.
Which will result in a clean print:
But how can I tweak os.walk() in order to produce a customized tree?
What if you want to:
Not go into hidden directories
Not list a certain type of files
Not go into directories that match a path pattern
The plain os.walk() loop can seem like an unstoppable force of nature once provided with a path. It will go through every directory and every file until it can’t go no more! Python’s docs are hinting a solution for that, but let’s make it more comprehensible.
How can I exclude hidden directories from os.walk()?
The answer here lies in making a copy of the dirs list and filtering the items.
With a list comprehension, now our list of directories does not include hidden directories. Moreover, os.walk() won’t go into those directories at all. We can do the same with the files list.
How can I exclude other specific directories?
For example, you may have another list of directory names that you want to ignore during your os.walk(). One way to do this would be the same as above, with a list comprehension. Another way of doing it would be to check the root each time, and in case it’s in the ignore list, empty both the dirs and the files list.
This kind of way will serve you better if you’re not working with an ignore list, but a path pattern. Since a list comprehension would not work with a pattern, you have to check if the root matches the pattern each time. You can use fnmatch or glob for that.
Python os.walk() Method with example
Hello dear readers! Welcome back to another section of my tutorial on Python. In this tutorial guide, we are going to be studying about the Os walk() method.
The Python os walk() method generates the file names in the directory tree by walking the tree either top-bottom or bottom-top.
Syntax
The following below is the syntax for Python Os walk() method —
Parameter Details
- top — Each directory that is rooted at directory, yields 3 turples.
- topdown — If the optional argument topdown is True or not been specified, then the directories are scanned from top down. If the topdown is set to False, then directories are scanned from bottom-top.
- onerror — This argument can show error to continue with walk, or raises the exception to abort the walk.
- followlinks — This visits directories pointed to by symlinks, if set to True.
Return Value
This method returns a value.
Example
The following below is a simple example —
Output
Let us compile and run the above code, it scans all the directories and subdirectories bottom-to-top.
If you change the value of topdown to True, it produces the following result —
Alright guys! This is where we are rounding up for this tutorial post. In my next tutorial, we are going to be discussing about the Python OS write() Method.
Feel free to ask your questions where necessary and i will attend to them as soon as possible. If this tutorial was helpful to you, you can use the share button to share this tutorial.
Follow us on our various social media platforms to stay updated with our latest tutorials. You can also subscribe to our newsletter in order to get our tutorials delivered directly to your emails.
Python Os.walk Example
Walk function in any operating system is like the os.path. The walk function generates the file names in a directory tree by navigating the tree in both directions, either a top-down or a bottom-up transverse. Every directory in any tree of a system has a base directory at its back. And then it acts as a subdirectory. The walk () generates the output in three tuples, the path, directory, and the files that come in any subdirectory.
- Dirpath: It is a string that leads the files or a folder towards the path to the directory.
- Dirnames: All the subdirectories names that don’t contain ‘.’ And ‘..’.
- Filenames: The directory path that contains files other than directory files. It is a list of folders or files that may be system-created or user-created files.
The names present in the list do not contain any component of the path. To fetch the full path that starts from the top to a directory or file in the directory path, we use os.walk.join () that has arguments of dirpath and the directory name.
Top-down and bottom-up are the two optional arguments at a time. This means that either one of the options is to be used in the function if we want a sequence of directories. Whereas in some cases, the top-down sequence is selected by default if we do not mention any argument regarding this sequence. If the argument top-down is True, the triple for the main directory is displayed first and then the subdirectories later on. And if the top-down is false, the triple for the directory is displayed after that for the subdirectories. In other words, the sequence is in a bottom-up manner.
When the top-down situation is True, the user can update the directory name list, and then walk() will only be applied on the subdirectories. Whereas updating the names of directories when the top-down is false is inapplicable. This is because, in the bottom-up mode, the directories names in the directories are displayed before the directory path. Listdir() function can eliminate the errors by default.
Python Os.Walk () working
Let’s see how the file system is traversed in Python. It works like a tree having a single root that further divides into branches. And the branches are expanded as sub-branches and so on. This walk function outputs the names of files in a directory tree by navigating the tree either from the top or from the bottom.
Syntax of Os.walk()
top = It is the head or a starting point of any subdirectory traverse. It yields 3 tuples, as we have described at the start of the article.
Topdown = In the case when it is true, the scanning of directories is from top to the end and vice versa in the opposite case.
Oneroor = This is a special feature that is designed to monitor the error. It can either show an error to keep going with the walk or raise the exception to dismiss the walk.
Follow links = Leads to unstoppable recursions; it is set to true.
Note: The followlinks option is set as true; it leads to unstoppable recursions if any link points to the base directory of its own. The walk () function does not take the record of the directories that it has already traversed.
Example 1
All the files in the directory are to be listed by using this function. Consider a snippet of code. The first step is to import the OS module, like other features to be imported by the python library.
After that, we will define a function named ‘os module’. Inside this function, use a for loop to get all the files following the directories and the root. The top-down technique is used here. And “followlinks” is kept True.
import os
# The os module provides a function that gets a list of files or folders
# in a directory
# ‘.’ signifies the current folder
# walk(. ) method generates the file names in a directory tree by walking the
# tree either top-down or bottom-up
def os_module ( ) :
for root , dirs , files in os . walk ( ‘.’ , topdown = False , onerror = None , followlinks = True ) :
for filename in files:
print ( filename )
def main ( ) :
print ( ‘ \n . Using the os module to list the files . . . \n ‘ )
os_module ( )
if __name__ == ‘__main__’ :
main ( )
This module will only print the filenames in the directory. The ‘.’ full stop we used here is specifically for the current folder. In the main program, we will call the function declared.
In the output, you can see the filenames in the current folder.
Example 2
This function will return the value by scanning all the directories and the subdirectories in the current path of the directory from the bottom to up direction, as top-down = False here.
A for loop is used to print the files and directories separately. The “os.path.join” brings the name and the main directory from the path.
A small part of the output is shown above. If one wants to get the output in the top to down order, then the module of top-down should be kept as True.
Example 3
This example differs from the previous ones in the type of parameters used. Here the function takes only the ‘path’. A for loop is used to display the values of files, directories of the path. If-statement is used to limit the resultant value in every line at a time. Here we have used 4. After every 4 words, the value will be shifted towards the next line. The starting value of the ‘I’ variable is taken as zero.
The respective output is shown below. The path, directories, and files are displayed as output.
Example 4
Like the walk () method in OS, we can also use the “os.listdir()” alternative to the path, which will display all the values of the particular value. i.e., here, we have used files to be printed. The path is taken as an argument for the function. The resultant value will be stored in the files variable. For loop will display all the files in the respective directory.
The list is displayed here that contains all the files in the respective directory.
Example 5
You have seen the examples in which all the folders or paths are displayed, even those we want to hide; they are also exposed. But “os.walk()” uses some features that allow excluding the hidden directories.
After importing the OS module, we have introduced the path which we will use in the example.
This feature is capable of hiding the directories, now with this list, the hidden directories are not included in the output.
Example 6
Suppose you have a list of names of the directory that you want to neglect during the walk () function. One way is to use the method as described above. The second way is going to be explained here. This will give the same result.
Example 7
If you want to take the print of the absolute values, the names of subdirectories, and the directories but not the whole path, in this case, the python walk function is used.
From the output, you can see that the resultant values have become limited.
Conclusion
The ‘Python os walk’ function is used to traverse all the paths in a directory, from top to bottom or from the bottom to the top. We have also seen how to hide the unwanted data to be previewed. Surely this article will be a helping edge for implementing the walk function of the OS module in Python.
About the author
Aqsa Yasin
I am a self-motivated information technology professional with a passion for writing. I am a technical writer and love to write for all Linux flavors and Windows.