I currently have a Python program calls shutil.rmtree when it finishes to delete a large number of files that is creates as it executes. This call is taking in the order of ~20+ seconds. I have profiled this using cProfile and almost all of this time is spent on posix.remove calls.
If I don't delete these files as part of the Python program but instead call rm -rf on the folder after the program is finished executing, the rm -rf executes in <5 seconds.
Is there something in particular that may be causing this huge difference in execution time?
shutil.rmtree makes a system call of os.stat on every file entry it traverses to determine if it's a file or a directory, which is a massive waste of time since that information is already obtained when a directory is listed.
This information is something that the os.walk function takes advantage of (see PEP-471 for details), with which you can implement rmtree yourself:
import os
def rmtree(directory):
for root, dirs, files in os.walk(directory, topdown=False):
for file in files:
os.remove(os.path.join(root, file))
for dir in dirs:
os.rmdir(os.path.join(root, dir))
os.rmdir(directory)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With