Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Efficiently removing subdirectories in dirnames from os.walk

On a mac in python 2.7 when walking through directories using os.walk my script goes through 'apps' i.e. appname.app, since those are really just directories of themselves. Well later on in processing I am hitting errors when going through them. I don't want to go through them anyways so for my purposes it would be best just to ignore those types of 'directories'.

So this is my current solution:

for root, subdirs, files in os.walk(directory, True):
    for subdir in subdirs:
        if '.' in subdir:
            subdirs.remove(subdir)
    #do more stuff

As you can see, the second for loop will run for every iteration of subdirs, which is unnecessary since the first pass removes everything I want to remove anyways.

There must be a more efficient way to do this. Any ideas?

like image 245
Patrick Bateman Avatar asked May 16 '12 14:05

Patrick Bateman


People also ask

What is the difference between OS Listdir () and OS walk?

The Python os. listdir() method returns a list of every file and folder in a directory. os. walk() function returns a list of every file in an entire file tree.

Is OS walk slow?

Python's built-in os. walk() is significantly slower than it needs to be, because – in addition to calling os. listdir() on each directory – it executes the stat() system call or GetFileAttributes() on each file to determine whether the entry is a directory or not.

What is topdown in OS walk?

topdown − If optional argument topdown is True or not specified, directories are scanned from top-down. If topdown is set to False, directories are scanned from bottom-up. onerror − This can show error to continue with the walk, or raise the exception to abort the walk.

What does the function OS Walk () do?

OS. walk() generate the file names in a directory tree by walking the tree either top-down or bottom-up. For each directory in the tree rooted at directory top (including top itself), it yields a 3-tuple (dirpath, dirnames, filenames). root : Prints out directories only from what you specified.


2 Answers

You can do something like this (assuming you want to ignore directories containing '.'):

subdirs[:] = [d for d in subdirs if '.' not in d]

The slice assignment (rather than just subdirs = ...) is necessary because you need to modify the same list that os.walk is using, not create a new one.

Note that your original code is incorrect because you modify the list while iterating over it, which is not allowed.

like image 188
interjay Avatar answered Nov 15 '22 20:11

interjay


Perhaps this example from the Python docs for os.walk will be helpful. It works from the bottom up (deleting).

# Delete everything reachable from the directory named in "top",
# assuming there are no symbolic links.
# CAUTION:  This is dangerous!  For example, if top == '/', it
# could delete all your disk files.
import os
for root, dirs, files in os.walk(top, topdown=False):
    for name in files:
        os.remove(os.path.join(root, name))
    for name in dirs:
        os.rmdir(os.path.join(root, name))

I am a bit confused about your goal, are you trying to remove a directory subtree and are encountering errors, or are you trying to walk a tree and just trying to list simple file names (excluding directory names)?

like image 45
Levon Avatar answered Nov 15 '22 20:11

Levon