I get a folder with 1 million files in it.
I would like to begin process immediately, when listing files in this folder, in Python or other script langage.
The usual functions (os.listdir in python...) are blocking and my program has to wait the end of the list, which can take a long time.
What's the best way to list huge folders ?
If convenient, change your directory structure; but if not, you can use ctypes to call opendir
and readdir
.
Here is a copy of that code; all I did was indent it properly, add the try/finally
block, and fix a bug. You might have to debug it. Particularly the struct layout.
Note that this code is not portable. You would need to use different functions on Windows, and I think the structs vary from Unix to Unix.
#!/usr/bin/python
"""
An equivalent os.listdir but as a generator using ctypes
"""
from ctypes import CDLL, c_char_p, c_int, c_long, c_ushort, c_byte, c_char, Structure, POINTER
from ctypes.util import find_library
class c_dir(Structure):
"""Opaque type for directory entries, corresponds to struct DIR"""
pass
c_dir_p = POINTER(c_dir)
class c_dirent(Structure):
"""Directory entry"""
# FIXME not sure these are the exactly correct types!
_fields_ = (
('d_ino', c_long), # inode number
('d_off', c_long), # offset to the next dirent
('d_reclen', c_ushort), # length of this record
('d_type', c_byte), # type of file; not supported by all file system types
('d_name', c_char * 4096) # filename
)
c_dirent_p = POINTER(c_dirent)
c_lib = CDLL(find_library("c"))
opendir = c_lib.opendir
opendir.argtypes = [c_char_p]
opendir.restype = c_dir_p
# FIXME Should probably use readdir_r here
readdir = c_lib.readdir
readdir.argtypes = [c_dir_p]
readdir.restype = c_dirent_p
closedir = c_lib.closedir
closedir.argtypes = [c_dir_p]
closedir.restype = c_int
def listdir(path):
"""
A generator to return the names of files in the directory passed in
"""
dir_p = opendir(path)
try:
while True:
p = readdir(dir_p)
if not p:
break
name = p.contents.d_name
if name not in (".", ".."):
yield name
finally:
closedir(dir_p)
if __name__ == "__main__":
for name in listdir("."):
print name
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With