Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

partial directory listing

Tags:

python

linux

Is it possible to get a partial directory listing?

In Python, I have a process that tries to get os.listdir of a directory containing >100,000 of files and it takes forever. I'd like to be able, let's say, to get a listing of the first 1,000 files quickly.

How can I achieve this?

like image 947
jldupont Avatar asked Aug 29 '12 02:08

jldupont


1 Answers

I found a solution that gives me a random order of the files :) (At least I can't see a pattern)

First I found this post in the python maillist. There are 3 files attached that you have to copy to your disk (opendir.pyx, setup.py, test.py). Next you need the python package Pyrex to compile the file opendir.pyx from the post. I had problems installing Pyrex and found that I had to install python-dev via apt-get. Next I installed the opendir package from the three above downloaded files with python setup.py install. The file test.py contains examples how to use it.

Next I was interested in how much faster this solution will be than using os.listdir and I created 200000 files with the following small shellscript .

for((i=0; i<200000; i++))
do
    touch $i
done

The following script is my benchmark running in the directory where I just created the files:

from opendir import opendir
from timeit import Timer
import os

def list_first_fast(i):
    d=opendir(".")
    filenames=[]
    for _ in range(i):
        name = d.read()
        if not name:
            break
        filenames.append(name)
    return filenames

def list_first_slow(i):
    return os.listdir(".")[:i]

if __name__ == '__main__':
    t1 = Timer("list_first_fast(100)", "from __main__ import list_first_fast")
    t2 = Timer("list_first_slow(100)", "from __main__ import list_first_slow")
    print "With opendir: ", t1.repeat(5, 100)
    print "With os.list: ", t2.repeat(5, 100)

The output on my system is:

With opendir:  [0.045053958892822266, 0.04376697540283203, 0.0437769889831543, 0.04387712478637695, 0.04404592514038086]
With os.list:  [9.50291895866394, 9.567682027816772, 9.865844964981079, 13.486984968185425, 9.51977801322937]

As you can see I got a speedup of a factor of 200 when returning a list with 100 filenames out of the 200000, thats pretty nice :).

I hope this is the goal you are trying to achieve.

like image 121
halex Avatar answered Oct 17 '22 01:10

halex