Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does python's os.walk() not reflect directory deletion?

Tags:

python

os.walk

I'm attempting to write a Python function that will recursively delete all empty directories. This means that if directory "a" contains only "b", "b" should be deleted, then "a" should be deleted (since it now contains nothing). If a directory contains anything, it is skipped. Illustrated:

top/a/b/
top/c/d.txt
top/c/foo/

Given this, the three directories "b", "a", and "foo" should be deleted, as "foo" and "b" are empty now, and "a" will become empty after the deletion of "b".

I'm attempting to do this via os.walk and shutil.rmtree. Unfortunately, my code is only deleting the first level of directories, but not ones newly emptied in the process.

I'm using the topdown=false parameter of os.walk. The documentation for os.walk says that "If topdown is False, the triple for a directory is generated after the triples for all of its subdirectories (directories are generated bottom-up)." That's not what I'm seeing.

Here's my code:

for root, dirs, files in os.walk(".", topdown=False):
  contents = dirs+files
  print root,"contains:",contents
  if len(contents) == 0:
    print 'Removing "%s"'%root
    shutil.rmtree(root)
  else:
    print 'Not removing "%s". It has:'%root,contents

If I have the directory structure described above, here's what I get:

./c/foo contains: []
Removing "./c/foo"
./c contains: ['foo', 'd.txt']
Not removing "./c". It has: ['foo', 'd.txt']
./a/b contains: []
Removing "./a/b"
./a contains: ['b']
Not removing "./a". It has: ['b']
. contains: ['c', 'a']
Not removing ".". It has: ['c', 'a']

Note that, even though I've removed "b", "a" is not removed, thinking that it still contains "b". What I'm confused about is that the documentation for os.walk says that it generates the triple for "./a" after generating the triple for "b". My output suggests otherwise. Similar story for "./c". It shows that it still has "foo", even though I had deleted it right out of the gate.

What am I doing wrong? (I'm using Python 2.6.6.)

like image 759
seanahern Avatar asked Feb 09 '15 20:02

seanahern


2 Answers

The documentation has this ...

No matter the value of topdown, the list of subdirectories is retrieved before the tuples for the directory and its subdirectories are generated.

like image 89
jcfollower Avatar answered Sep 21 '22 15:09

jcfollower


jcfollower's answer is absolutely correct about the cause of the issue you're encountering: The file system is always read top-down, even if the results are yielded from os.walk in a bottom-up manner. This means that the filesystem modifications you perform won't be reflected in the later results.

A solution to this issue is to maintain a set of the deleted directories, so that you can filter them out of their parent's list of subdirectories:

removed = set()                                               # first new line
for root, dirs, files in os.walk(".", topdown=False):
      dirs = [dir for dir in dirs if os.path.join(root, dir) not in removed] # second
      contents = dirs+files
      print root,"contains:",contents
      if len(contents) == 0:
          print 'Removing "%s"'%root
          shutil.rmtree(root)
          removed.add(root)                                   # third new line
      else:
          print 'Not removing "%s". It has:'%root,contents

There are three new lines. The first, at the top, creates an empty removed set to contain the removed directories. The second replaces the dirs list with a new list that doesn't include any subdirectories that are in the removed set, since they were deleted in a previous step. The last new line adds the current directory to the set when has been removed.

like image 45
Blckknght Avatar answered Sep 23 '22 15:09

Blckknght