Quicker to os.walk or glob?

Tags:

I'm messing around with file lookups in python on a large hard disk. I've been looking at os.walk and glob. I usually use os.walk as I find it much neater and seems to be quicker (for usual size directories).

Has anyone got any experience with them both and could say which is more efficient? As I say, glob seems to be slower, but you can use wildcards etc, were as with walk, you have to filter results. Here is an example of looking up core dumps.

core = re.compile(r"core\.\d*") for root, dirs, files in os.walk("/path/to/dir/")     for file in files:         if core.search(file):             path = os.path.join(root,file)             print "Deleting: " + path             os.remove(path)

for file in iglob("/path/to/dir/core.*")     print "Deleting: " + file     os.remove(file)

904

asked Jan 19 '12 18:01

joedborg

2 Answers

I made a research on a small cache of web pages in 1000 dirs. The task was to count a total number of files in dirs. The output is:

os.listdir: 0.7268s, 1326786 files found os.walk: 3.6592s, 1326787 files found glob.glob: 2.0133s, 1326786 files found

As you see, os.listdir is quickest of three. And glog.glob is still quicker than os.walk for this task.

The source:

import os, time, glob  n, t = 0, time.time() for i in range(1000):     n += len(os.listdir("./%d" % i)) t = time.time() - t print "os.listdir: %.4fs, %d files found" % (t, n)  n, t = 0, time.time() for root, dirs, files in os.walk("./"):     for file in files:         n += 1 t = time.time() - t print "os.walk: %.4fs, %d files found" % (t, n)  n, t = 0, time.time() for i in range(1000):     n += len(glob.glob("./%d/*" % i)) t = time.time() - t print "glob.glob: %.4fs, %d files found" % (t, n)

196

answered Sep 23 '22 10:09

a5kin

Don't waste your time for optimization before measuring/profiling. Focus on making your code simple and easy to maintain.

For example, in your code you precompile RE, which does not give you any speed boost, because re module has internal re._cache of precompiled REs.

Keep it simple
if it's slow, then profile
once you know exactly what needs to be optimized do some tweaks and always document it

Note, that some optimization done several years prior can make code run slower compared to "non-optimized" code. This applies especially for modern JIT based languages.

answered Sep 20 '22 10:09

Michał Šrajer

Related questions
                            
                                Invalid transaction persisting across requests
                            
                                Program web applications in python without a framework?
                            
                                Triple inheritance causes metaclass conflict... Sometimes
                            
                                Convert DataFrameGroupBy object to DataFrame pandas
                            
                                Can You Consistently Keep Track of Column Labels Using Sklearn's Transformer API?
                            
                                How to import python module from .so file?
                            
                                Select checkbox using Selenium with Python
                            
                                How to crop biggest rectangle out of an image
                            
                                Making sure a Python script with subprocesses dies on SIGINT
                            
                                What is the python equivalent to a Java .jar file?
                            
                                Faster way to read Excel files to pandas dataframe
                            
                                How can I use C++ class in Python?
                            
                                Why do -1 and -2 both hash to -2 in CPython? [duplicate]
                            
                                Flask and React routing
                            
                                Python: Typehints for argparse.Namespace objects
                            
                                Why not always use psyco for Python code?
                            
                                A data-structure for 1:1 mappings in python?
                            
                                Control the size TextArea widget look in django admin
                            
                                Running pytest test functions inside a jupyter notebook
                            
                                Why are single type constraints disallowed in Python?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Quicker to os.walk or glob?

Tags:

python

glob

traversal

os.walk

directory-walk

joedborg

People also ask

2 Answers

a5kin

Michał Šrajer

Recent Activity

Donate For Us