I'd like to recursively walk a directory, but I want python to break from any single listdir if it encounters a directory with greater than 100 files. Basically, I'm searching for a (.TXT) file, but I want to avoid directories with large DPX image sequences (usually 10,000 files). Since DPXs live in directories by themselves with no sub directories, I'd like to break that loop ASAP.
So long story short, if python encounters a file matching ".DPX$" it stops listing the sub-directory, backs out, skips that sub-directory and continues the walk in other sub-directories.
Is this possible to break a directory listing loop before all the list results are returned?
If by 'directory listing loop' you mean os.listdir()
then no. This cannot be broken from. You could however look at the os.path.walk()
or os.walk()
methods and just remove all the directories which contain DPX
files. If you use os.walk()
and are walking top-down you can affect what direcotries Python walks into by just modifying the list of directories. os.path.walk()
allows you to choose where you walk with the visit method.
According to the documentation for os.walk
:
When topdown is
True
, the caller can modify the dirnames list in-place (e.g., viadel
or slice assignment), andwalk()
will only recurse into the subdirectories whose names remain in dirnames; this can be used to prune the search, or to impose a specific order of visiting. Modifying dirnames when topdown isFalse
is ineffective, since the directories in dirnames have already been generated by the time dirnames itself is generated.
So in theory if you empty out dirnames
then os.walk
will not recurse down any additional directories. Note the comment about "...via del or slice assignment"; you cannot simply do dirnames=[]
because this won't actually affect the contents of the dirnames
list.
The right way to avoid allocating the list of names using the os.listdir is to use the os level function as @Charles Duffy said.
Inspired from this other post: List files in a folder as a stream to begin process immediately
I added how to solve the specific OP question and used the re-entrant version of the function.
from ctypes import CDLL, c_char_p, c_int, c_long, c_ushort, c_byte, c_char, Structure, POINTER, byref, cast, sizeof, get_errno
from ctypes.util import find_library
class c_dir(Structure):
"""Opaque type for directory entries, corresponds to struct DIR"""
pass
class c_dirent(Structure):
"""Directory entry"""
# FIXME not sure these are the exactly correct types!
_fields_ = (
('d_ino', c_long), # inode number
('d_off', c_long), # offset to the next dirent
('d_reclen', c_ushort), # length of this record
('d_type', c_byte), # type of file; not supported by all file system types
('d_name', c_char * 4096) # filename
)
c_dirent_p = POINTER(c_dirent)
c_dirent_pp = POINTER(c_dirent_p)
c_dir_p = POINTER(c_dir)
c_lib = CDLL(find_library("c"))
opendir = c_lib.opendir
opendir.argtypes = [c_char_p]
opendir.restype = c_dir_p
readdir_r = c_lib.readdir_r
readdir_r.argtypes = [c_dir_p, c_dirent_p, c_dirent_pp]
readdir_r.restype = c_int
closedir = c_lib.closedir
closedir.argtypes = [c_dir_p]
closedir.restype = c_int
import errno
def listdirx(path):
"""
A generator to return the names of files in the directory passed in
"""
dir_p = opendir(path)
if not dir_p:
raise IOError()
entry_p = cast(c_lib.malloc(sizeof(c_dirent)), c_dirent_p)
try:
while True:
res = readdir_r(dir_p, entry_p, byref(entry_p))
if res:
raise IOError()
if not entry_p:
break
name = entry_p.contents.d_name
if name not in (".", ".."):
yield name
finally:
if dir_p:
closedir(dir_p)
if entry_p:
c_lib.free(entry_p)
if __name__ == '__main__':
import sys
path = sys.argv[1]
max_per_dir = int(sys.argv[2])
for idx, entry in enumerate(listdirx(path)):
if idx >= max_per_dir:
break
print entry
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With