Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to improve searching with os.walk and fnmatch

I'm using os.walk and fnmatch with filters to search a pc's hdd for all image files. This works perfectly fine but is extremely slow since it takes about 9 minutes to search +-70000 images.

Any ideas on optimizing this code to run faster? Any other suggestions?

I'm using python 2.7.2 by the way.

import fnmatch
import os

images = ['*.jpg', '*.jpeg', '*.png', '*.tif', '*.tiff']
matches = []

for root, dirnames, filenames in os.walk("C:\\"):
    for extension in images:
        for filename in fnmatch.filter(filenames, extension):
            matches.append(os.path.join(root, filename))
like image 526
user1401950 Avatar asked Feb 21 '23 10:02

user1401950


2 Answers

I'm not one of those regex maniacs who always resorts to the re hammer to solve all problems, but this actually ran a wee bit over twice as fast in my tests as your fnmatch version:

import os
import re

matches = []

img_re = re.compile(r'.+\.(jpg|png|jpeg|tif|tiff)$', re.IGNORECASE)

for root, dirnames, filenames in os.walk(r"C:\windows"):
    matches.extend(os.path.join(root, name) for name in filenames if img_re.match(name))
like image 78
John Gaines Jr. Avatar answered Feb 25 '23 17:02

John Gaines Jr.


The Python looks pretty much ok to me.

You could experiment with

for root, dirnames, filenames in os.walk("C:\\"):
    for extension in extensions:
        matches.extend(os.path.join(root, filename) for filename 
                       in fnmatch.filter(filenames, extension))

If that does not make a difference (I suppose it will not), I believe your harddisk has become the bottleneck in the process (remember, disk == slow and you're iterating over and listing the files of every directory in your system).

If the harddisk is the bottleneck, the results from multiple dir /s ... statements should definitely not be extravagantly faster than the Python solution.

like image 31
ChristopheD Avatar answered Feb 25 '23 17:02

ChristopheD