Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

A Faster way of Directory walking instead of os.listdir?

Tags:

I am trying to improve performance of elfinder , an ajax based file manager(elRTE.ru) .

It uses os.listdir in a recurisve to walk through all directories recursively and having a performance hit (like listing a dir with 3000 + files takes 7 seconds ) ..

I am trying to improve performance for it here is it's walking function:

        for d in os.listdir(path):
            pd = os.path.join(path, d)
            if os.path.isdir(pd) and not os.path.islink(pd) and self.__isAccepted(d):
                tree['dirs'].append(self.__tree(pd))

My questions are :

  1. If i change os.walk instead of os.listdir , would it improve performance?
  2. how about using dircache.listdir() ? cache WHOLE directory/subdir contents at the initial request and return cache results , if theres no new files uploaded or no changes in file?
  3. Is there any other method of Directory walking which is faster?
  4. Any Other Server Side file browser which is fast written in python (but i prefer to make this one fast)?
like image 713
Phyo Arkar Lwin Avatar asked Jul 01 '10 22:07

Phyo Arkar Lwin


People also ask

Is Scandir faster than Listdir?

scandir() is a directory iteration function like os. listdir(), except that instead of returning a list of bare filenames, it yields DirEntry objects that include file type and stat information along with the name. Using scandir() increases the speed of os.

What is the difference between OS Listdir () and OS walk?

The Python os. listdir() method returns a list of every file and folder in a directory. os. walk() function returns a list of every file in an entire file tree.

What does OS Listdir (' ') mean?

listdir() method in python is used to get the list of all files and directories in the specified directory. If we don't specify any directory, then list of files and directories in the current working directory will be returned. Syntax: os.listdir(path)

Is OS walk fast?

In practice, removing all those extra system calls makes os. walk() about 8-9 times as fast on Windows, and about 2-3 times as fast on POSIX systems. So we're not talking about micro- optimizations. See more benchmarks here.


2 Answers

I was just trying to figure out how to speed up os.walk on a largish file system (350,000 files spread out within around 50,000 directories). I'm on a linux box usign an ext3 file system. I discovered that there is a way to speed this up for MY case.

Specifically, Using a top-down walk, any time os.walk returns a list of more than one directory, I use os.stat to get the inode number of each directory, and sort the directory list by inode number. This makes walk mostly visit the subdirectories in inode order, which reduces disk seeks.

For my use case, it sped up my complete directory walk from 18 minutes down to 13 minutes...

like image 66
garlon4 Avatar answered Sep 19 '22 15:09

garlon4


Did you check out scandir (previously betterwalk)? Did not try it myself, but there's a discussion about it here and another one here. It claims to have a speedup of 3~10x on MacOSX/Linux and 7~50x on Windows by avoiding redundant calls to os.stat(). It's also now included in the standard library as of Python 3.5.

Python's built-in os.walk() is significantly slower than it needs to be, because -- in addition to calling listdir() on each directory -- it calls stat() on each file to determine whether the filename is a directory or not. But both FindFirstFile / FindNextFile on Windows and readdir on Linux/OS X already tell you whether the files returned are directories or not, so no further stat system calls are needed. In short, you can reduce the number of system calls from about 2N to N, where N is the total number of files and directories in the tree.

In practice, removing all those extra system calls makes os.walk() about 7-50 times as fast on Windows, and about 3-10 times as fast on Linux and Mac OS X.

From the project's readme.

like image 34
gaborous Avatar answered Sep 19 '22 15:09

gaborous