I need to iterate through a potentially very large directory (arbitrarily large). From what I understand, the regular glob.glob
function stores a list of all the matching filenames in memory, but the glob.iglob
function uses an iterator. So using the regular glob.glob
function is out of the question, since there may be A lot of files in the directory.
My problem is that iglob
iterates through the directory in a seemingly random order. I would like it to iterate through the files in alphabetical order. I cannot get a list of all the filenames at once, and just sort them, so I am wondering if there is a way to make iglob
iterate through the directory in alphabetical order.
No, there isn't, not without reading all the contents of the directory into memory. The operating system provides the filenames in directory order, and would need to read the contents into memory in full as well if it wanted to sort these.
You could sort the results after iglob()
matched files, provided that set is small enough to fit into memory by calling sorted()
on the iglob()
output:
for filename in sorted(iglob(path)):
Note that iglob()
already loads all entries of a single directory into a list when not recursing to subdirectories (partly because fnmatch()
returns a list).
From the glob
module's documentation:
The
glob
module finds all the pathnames matching a specified pattern according to the rules used by the Unix shell. No tilde expansion is done, but*
,?
, and character ranges expressed with[]
will be correctly matched. This is done by using theos.listdir()
andfnmatch.fnmatch()
functions in concert, and not by actually invoking a subshell.
And if we look the documentation for os.listdir
:
os.listdir(path)
Return a list containing the names of the entries in the directory given by path. The list is in arbitrary order. It does not include the special entries '.' and '..' even if they are present in the directory.
So glob.glob
does not return the files in alphabetical order. It is not stated anywhere in the documentation. Relying on this behaviour is a bug. If you want an ordered sequence you must sort the result. You can then easily imagine that there is no way to make iglob
return a sorted result since it does not even have all results available.
If memory is really a problem then you have two choices:
iglob
.If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With