I'm attempting to write a Python function that will recursively delete all empty directories. This means that if directory "a" contains only "b", "b" should be deleted, then "a" should be deleted (since it now contains nothing). If a directory contains anything, it is skipped. Illustrated:
top/a/b/
top/c/d.txt
top/c/foo/
Given this, the three directories "b", "a", and "foo" should be deleted, as "foo" and "b" are empty now, and "a" will become empty after the deletion of "b".
I'm attempting to do this via os.walk
and shutil.rmtree
. Unfortunately, my code is only deleting the first level of directories, but not ones newly emptied in the process.
I'm using the topdown=false
parameter of os.walk
. The documentation for os.walk
says that "If topdown is False, the triple for a directory is generated after the triples for all of its subdirectories (directories are generated bottom-up)." That's not what I'm seeing.
Here's my code:
for root, dirs, files in os.walk(".", topdown=False):
contents = dirs+files
print root,"contains:",contents
if len(contents) == 0:
print 'Removing "%s"'%root
shutil.rmtree(root)
else:
print 'Not removing "%s". It has:'%root,contents
If I have the directory structure described above, here's what I get:
./c/foo contains: []
Removing "./c/foo"
./c contains: ['foo', 'd.txt']
Not removing "./c". It has: ['foo', 'd.txt']
./a/b contains: []
Removing "./a/b"
./a contains: ['b']
Not removing "./a". It has: ['b']
. contains: ['c', 'a']
Not removing ".". It has: ['c', 'a']
Note that, even though I've removed "b", "a" is not removed, thinking that it still contains "b". What I'm confused about is that the documentation for os.walk
says that it generates the triple for "./a" after generating the triple for "b". My output suggests otherwise. Similar story for "./c". It shows that it still has "foo", even though I had deleted it right out of the gate.
What am I doing wrong? (I'm using Python 2.6.6.)
The documentation has this ...
No matter the value of topdown, the list of subdirectories is retrieved before the tuples for the directory and its subdirectories are generated.
jcfollower's answer is absolutely correct about the cause of the issue you're encountering: The file system is always read top-down, even if the results are yielded from os.walk
in a bottom-up manner. This means that the filesystem modifications you perform won't be reflected in the later results.
A solution to this issue is to maintain a set of the deleted directories, so that you can filter them out of their parent's list of subdirectories:
removed = set() # first new line
for root, dirs, files in os.walk(".", topdown=False):
dirs = [dir for dir in dirs if os.path.join(root, dir) not in removed] # second
contents = dirs+files
print root,"contains:",contents
if len(contents) == 0:
print 'Removing "%s"'%root
shutil.rmtree(root)
removed.add(root) # third new line
else:
print 'Not removing "%s". It has:'%root,contents
There are three new lines. The first, at the top, creates an empty removed
set to contain the removed directories. The second replaces the dirs
list with a new list that doesn't include any subdirectories that are in the removed set, since they were deleted in a previous step. The last new line adds the current directory to the set when has been removed.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With