I am trying to improve performance of elfinder , an ajax based file manager(elRTE.ru) .
It uses os.listdir in a recurisve to walk through all directories recursively and having a performance hit (like listing a dir with 3000 + files takes 7 seconds ) ..
I am trying to improve performance for it here is it's walking function:
for d in os.listdir(path):
pd = os.path.join(path, d)
if os.path.isdir(pd) and not os.path.islink(pd) and self.__isAccepted(d):
tree['dirs'].append(self.__tree(pd))
My questions are :
scandir() is a directory iteration function like os. listdir(), except that instead of returning a list of bare filenames, it yields DirEntry objects that include file type and stat information along with the name. Using scandir() increases the speed of os.
The Python os. listdir() method returns a list of every file and folder in a directory. os. walk() function returns a list of every file in an entire file tree.
listdir() method in python is used to get the list of all files and directories in the specified directory. If we don't specify any directory, then list of files and directories in the current working directory will be returned. Syntax: os.listdir(path)
In practice, removing all those extra system calls makes os. walk() about 8-9 times as fast on Windows, and about 2-3 times as fast on POSIX systems. So we're not talking about micro- optimizations. See more benchmarks here.
I was just trying to figure out how to speed up os.walk on a largish file system (350,000 files spread out within around 50,000 directories). I'm on a linux box usign an ext3 file system. I discovered that there is a way to speed this up for MY case.
Specifically, Using a top-down walk, any time os.walk returns a list of more than one directory, I use os.stat to get the inode number of each directory, and sort the directory list by inode number. This makes walk mostly visit the subdirectories in inode order, which reduces disk seeks.
For my use case, it sped up my complete directory walk from 18 minutes down to 13 minutes...
Did you check out scandir (previously betterwalk)? Did not try it myself, but there's a discussion about it here and another one here. It claims to have a speedup of 3~10x on MacOSX/Linux and 7~50x on Windows by avoiding redundant calls to os.stat(). It's also now included in the standard library as of Python 3.5.
Python's built-in os.walk() is significantly slower than it needs to be, because -- in addition to calling listdir() on each directory -- it calls stat() on each file to determine whether the filename is a directory or not. But both FindFirstFile / FindNextFile on Windows and readdir on Linux/OS X already tell you whether the files returned are directories or not, so no further stat system calls are needed. In short, you can reduce the number of system calls from about 2N to N, where N is the total number of files and directories in the tree.
In practice, removing all those extra system calls makes os.walk() about 7-50 times as fast on Windows, and about 3-10 times as fast on Linux and Mac OS X.
From the project's readme.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With