Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is shutil.rmtree() so slow?

Tags:

python

rm

I went to check how to remove a directory in Python, and was led to use shutil.rmtree(). It's speed surprised me, as compared to what I'd expect from a rm --recursive. Are there faster alternatives, short of using subprocess module?

like image 831
tshepang Avatar asked Mar 29 '11 10:03

tshepang


2 Answers

The implementation does a lot of extra processing:

def rmtree(path, ignore_errors=False, onerror=None):
    """Recursively delete a directory tree.

    If ignore_errors is set, errors are ignored; otherwise, if onerror
    is set, it is called to handle the error with arguments (func,
    path, exc_info) where func is os.listdir, os.remove, or os.rmdir;
    path is the argument to that function that caused it to fail; and
    exc_info is a tuple returned by sys.exc_info(). If ignore_errors
    is false and onerror is None, an exception is raised.

    """
    if ignore_errors:
         def onerror(*args):
              pass
    elif onerror is None:
         def onerror(*args):
              raise
    try:
         if os.path.islink(path):
              # symlinks to directories are forbidden, see bug #1669
              raise OSError("Cannot call rmtree on a symbolic link")
    except OSError:
         onerror(os.path.islink, path, sys.exc_info())
         # can't continue even if onerror hook returns
         return
    names = []
    try:
         names = os.listdir(path)
    except os.error, err:
         onerror(os.listdir, path, sys.exc_info())
    for name in names:
         fullname = os.path.join(path, name)
         try:
              mode = os.lstat(fullname).st_mode
         except os.error:
              mode = 0
         if stat.S_ISDIR(mode):
              rmtree(fullname, ignore_errors, onerror)
         else:
             try:
                 os.remove(fullname)
             except os.error, err:
                 onerror(os.remove, fullname, sys.exc_info())
    try:
         os.rmdir(path)
    except os.error:
         onerror(os.rmdir, path, sys.exc_info()) 

Note the os.path.join() used to create new filenames; string operations do take time. The rm(1) implementation instead uses the unlinkat(2) system call, which doesn't do any additional string operations. (And, in fact, saves the kernel from walking through an entire namei() just to find the common directory, over and over and over again. The kernel's dentry cache is good and useful, but that can still be a fair amount of in-kernel string manipulation and comparisons.) The rm(1) utility gets to bypass all that string manipulation, and just use a file descriptor for the directory.

Furthermore, both rm(1) and rmtree() check the st_mode of every file and directory in the tree; but the C implementation does not need to turn every struct statbuf into a Python object just to perform a simple integer mask operation. I don't know how long this process takes, but it happens once for every file, directory, pipe, symlink, etc. in the directory tree.

like image 179
sarnold Avatar answered Oct 05 '22 07:10

sarnold


If you care about speed:

os.system('rm -fr "%s"' % your_dirname)

Apart from that I did not find shutil.rmtree() much slower...of course there is extra overhead on the Python level involved. And apart from that I only believe in such a claim if you provide reasonable numbers.

like image 20
Andreas Jung Avatar answered Oct 05 '22 07:10

Andreas Jung