Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

List files in a folder as a stream to begin process immediately

I get a folder with 1 million files in it.

I would like to begin process immediately, when listing files in this folder, in Python or other script langage.

The usual functions (os.listdir in python...) are blocking and my program has to wait the end of the list, which can take a long time.

What's the best way to list huge folders ?

like image 704
Adrien Avatar asked Dec 09 '10 22:12

Adrien


1 Answers

If convenient, change your directory structure; but if not, you can use ctypes to call opendir and readdir.

Here is a copy of that code; all I did was indent it properly, add the try/finally block, and fix a bug. You might have to debug it. Particularly the struct layout.

Note that this code is not portable. You would need to use different functions on Windows, and I think the structs vary from Unix to Unix.

#!/usr/bin/python
"""
An equivalent os.listdir but as a generator using ctypes
"""

from ctypes import CDLL, c_char_p, c_int, c_long, c_ushort, c_byte, c_char, Structure, POINTER
from ctypes.util import find_library

class c_dir(Structure):
    """Opaque type for directory entries, corresponds to struct DIR"""
    pass
c_dir_p = POINTER(c_dir)

class c_dirent(Structure):
    """Directory entry"""
    # FIXME not sure these are the exactly correct types!
    _fields_ = (
        ('d_ino', c_long), # inode number
        ('d_off', c_long), # offset to the next dirent
        ('d_reclen', c_ushort), # length of this record
        ('d_type', c_byte), # type of file; not supported by all file system types
        ('d_name', c_char * 4096) # filename
        )
c_dirent_p = POINTER(c_dirent)

c_lib = CDLL(find_library("c"))
opendir = c_lib.opendir
opendir.argtypes = [c_char_p]
opendir.restype = c_dir_p

# FIXME Should probably use readdir_r here
readdir = c_lib.readdir
readdir.argtypes = [c_dir_p]
readdir.restype = c_dirent_p

closedir = c_lib.closedir
closedir.argtypes = [c_dir_p]
closedir.restype = c_int

def listdir(path):
    """
    A generator to return the names of files in the directory passed in
    """
    dir_p = opendir(path)
    try:
        while True:
            p = readdir(dir_p)
            if not p:
                break
            name = p.contents.d_name
            if name not in (".", ".."):
                yield name
    finally:
        closedir(dir_p)

if __name__ == "__main__":
    for name in listdir("."):
        print name
like image 174
Jason Orendorff Avatar answered Sep 19 '22 02:09

Jason Orendorff