How to limit number of concurrent threads in Python?

Tags:

multithreading

How can I limit the number of concurrent threads in Python?

For example, I have a directory with many files, and I want to process all of them, but only 4 at a time in parallel.

Here is what I have so far:

def process_file(fname):
        # open file and do something                                                                                            

def process_file_thread(queue, fname):
    queue.put(process_file(fname))

def process_all_files(d):
    files=glob.glob(d + '/*')
    q=Queue.Queue()
    for fname in files:
        t=threading.Thread(target=process_file_thread, args=(q, fname))
        t.start()
    q.join()

def main():
    process_all_files('.')
    # Do something after all files have been processed

How can I modify the code so that only 4 threads are run at a time?

Note that I want to wait for all files to be processed and then continue and work on the processed files.

244

asked Aug 21 '13 00:08

1 Answers

For example, I have a directory with many files, and I want to process all of them, but only 4 at a time in parallel.

That's exactly what a thread pool does: You create jobs, and the pool runs 4 at a time in parallel. You can make things even simpler by using an executor, where you just hand it functions (or other callables) and it hands you back futures for the results. You can build all of this yourself, but you don't have to.*

The stdlib's concurrent.futures module is the easiest way to do this. (For Python 3.1 and earlier, see the backport.) In fact, one of the main examples is very close to what you want to do. But let's adapt it to your exact use case:

def process_all_files(d):
    files = glob.glob(d + '/*')
    with concurrent.futures.ThreadPoolExecutor(max_workers=4) as executor:
        fs = [executor.submit(process_file, file) for file in files]
        concurrent.futures.wait(fs)

If you wanted process_file to return something, that's almost as easy:

def process_all_files(d):
    files = glob.glob(d + '/*')
    with concurrent.futures.ThreadPoolExecutor(max_workers=4) as executor:
        fs = [executor.submit(process_file, file) for file in files]
        for f in concurrent.futures.as_completed(fs):
            do_something(f.result())

And if you want to handle exceptions too… well, just look at the example; it's just a try/except around the call to result().

* If you want to build them yourself, it's not that hard. The source to multiprocessing.pool is well written and commented, and not that complicated, and most of the hard stuff isn't relevant to threading; the source to concurrent.futures is even simpler.

187

answered Nov 07 '22 14:11

abarnert

Related questions
                            
                                How to clear Python Shell in IDLE [duplicate]
                            
                                Understanding python class attributes
                            
                                greenlet in Win 7: DLL failed: the specified procedure could not be found
                            
                                What are the benefits / advantages of using Python 3? [closed]
                            
                                OpenCV Perspective Transform giving unexpected result
                            
                                The best way to validate form-data in tornado project?
                            
                                How to draw crosshair and plot mouse position in pyqtgraph?
                            
                                Openpyxl load_workbook and save take too long
                            
                                A Fortran analog to python's super()?
                            
                                Marking Cython as a Build Dependency?
                            
                                in python why if rank: is faster than if rank != 0:
                            
                                how do I redirect to the flask blueprint parent?
                            
                                regex match and replace multiple patterns
                            
                                Boto DynamoDB2 conditional put_item
                            
                                Determine if a property is a backref in sqlalchemy
                            
                                Xvfb - Browser window does not fit display
                            
                                Is split() of a static string a run-time or compile-time operation?
                            
                                Python object oriented design concepts
                            
                                How to find multiline text between curly braces?
                            
                                how to get the integer value of a single pyserial byte in python

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to limit number of concurrent threads in Python?

Tags:

python

multithreading

Frank

People also ask

1 Answers

abarnert

Recent Activity

Donate For Us