How do I run os.walk in parallel in Python?

Tags:

I wrote a simple app in Java that takes a list of paths and generates a file with all the file paths under that original list.

If I have paths.txt that has:

c:\folder1\ c:\folder2\ ... ... c:\folder1000\

My app runs the recursive function on each path multithreaded, and returns a file with all the file paths under these folders.

Now I want to write this app in Python.

I've written a simple app that uses os.walk() to run through a given folder and print the filepaths to output.

Now I want to run it in parallel, and I've seen that Python has some modules for this: multithreaded and multiprocessing.

What is the best what to do this? And within that way, how is it performed?

339

asked Aug 12 '12 07:08

user1251654

1 Answers

Here is a multiprocessing solution:

from multiprocessing.pool import Pool from multiprocessing import JoinableQueue as Queue import os  def explore_path(path):     directories = []     nondirectories = []     for filename in os.listdir(path):         fullname = os.path.join(path, filename)         if os.path.isdir(fullname):             directories.append(fullname)         else:             nondirectories.append(filename)     outputfile = path.replace(os.sep, '_') + '.txt'     with open(outputfile, 'w') as f:         for filename in nondirectories:             print >> f, filename     return directories  def parallel_worker():     while True:         path = unsearched.get()         dirs = explore_path(path)         for newdir in dirs:             unsearched.put(newdir)         unsearched.task_done()  # acquire the list of paths with open('paths.txt') as f:     paths = f.read().split()  unsearched = Queue() for path in paths:     unsearched.put(path)  with Pool(5) as pool:     for i in range(5):         pool.apply_async(parallel_worker)  unsearched.join() print('Done')

189

answered Dec 26 '22 10:12

Raymond Hettinger

Related questions
                            
                                why is LZMA SDK (7-zip) so slow
                            
                                Finding my way through Scalaz [duplicate]
                            
                                Force AutoFixture to use the greediest constructor
                            
                                How to get Visual Studio performance tools to load symbols for "System.Core.ni.dll"
                            
                                Read and Write to Java file via Resource
                            
                                How do I compile for Windows XP with Visual Studio 2012?
                            
                                How can I reinstall a cassandra on ubuntu?
                            
                                Get Twitter Feed as JSON without authentication
                            
                                Visual studio 2012 slow unit testing
                            
                                Apache Multiple Sub Domains With One IP Address
                            
                                Visual Studio design view contains yellow warning icons?
                            
                                Testing error pages in Rails with Rspec + Capybara

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With