What is the fastest way to iterate over all files in a directory using NTFS and Windows 7, when the filecount in the directory is bigger than 2.500.000? All Files are flat under the top-level directory.
Currently I use
for root, subFolders, files in os.walk(rootdir):
for file in files:
f = os.path.join(root,file)
with open(f) as cf:
[...]
but it is very very slow. The process has been running for about an hour and still has not processed a single file but still grows with about 2kB of Memory Usage per second.
By default os.walk
walk the directory tree bottom-up. If you have a deep tree with many leafs, I guess this could leave to performances penalties -- or at least for an increased "statup" time, since walk
has to read lots of data before processing the first file.
All of this being speculative, have you tried to force a topdown explorations:
for root, subFolders, files in os.walk(rootdir, topdown=True):
...
EDIT:
As the files appear to be in a flat directory, maybe glob.iglob
could leave to better performance by returning an iterator (whereas other method like os.walk
, os.listdir
or glob.glob
build first the list of all files). Could you try something like that:
import glob
# ...
for infile in glob.iglob( os.path.join(rootdir, '*.*') ):
# ...
I found that os.scandir
(in python standard-library since 3.5) seems to actually do the trick also in windows! (as noted in the comments it does its job equally well on MacOS)!
consider the following example:
"retrieve 100 paths from a folder that contains millions of files"
os.scandir
achieves this in a fraction of a second
import os
from itertools import islice
from pathlib import Path
path = Path("path to a folder with a lot of files")
paths = [i.path for i in islice(os.scandir(path), 100))]
All the other tested options (iterdir, glob, iglob
) somehow take a ridiculous amount of time even though they are supposed to return iterators...
paths = list(islice(path.iterdir(), 100))
paths = list(islice(path.rglob(""), 100))
import glob
paths = list(islice(glob.iglob(str(path / "*.*")), 100))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With