store results ThreadPoolExecutor

Tags:

I am fairly new to parallel processing with "concurrent.futures" and I am testing some simple experiments. The code I have written seems to work, but I am not sure how to store the results. I have tried to create a list ("futures") and append the results to that, but that considerably slow down the procedure. I am wondering if there is a better way to do that. Thank you.

import concurrent.futures
import time

couple_ods= []
futures=[]

dtab={}
for i in range(100):
    for j in range(100):
       dtab[i,j]=i+j/2
       couple_ods.append((i,j))

avg_speed=100
def task(i):
    origin=i[0]
    destination=i[1]
    time.sleep(0.01)
    distance=dtab[origin,destination]/avg_speed
    return distance
start1=time.time()
def main():
    with concurrent.futures.ThreadPoolExecutor() as executor:
       for number in couple_ods:
          future=executor.submit(task,number)
          futures.append(future.result())

if __name__ == '__main__':
    main()
end1=time.time()

270

asked Aug 29 '18 17:08

Mike

1 Answers

When you call future.result(), that blocks until the value is ready. So, you’re not getting any benefits out of parallelism here—you start one task, wait for it to finish, start another, wait for it to finish, and so on.

Of course your example won’t benefit from threading in the first place. Your tasks are doing nothing but CPU-bound Python computation, which means that (at least in CPython, MicroPython, and PyPy, which are the only complete implementations that come with concurrent.futures), the GIL (Global Interpreter Lock) will prevent more than one of your threads from progressing at a time.

Hopefully your real program is different. If it’s doing I/O-bound stuff (making network requests, reading files, etc.), or using an extension library like NumPy that releases the GIL around heavy CPU work, then it will work fine. But otherwise, you’ll want to use ProcessPoolExecutor here.

Anyway, what you want to do is append future itself to a list, so you get a list of all of the futures before waiting for any of them:

for number in couple_ods:
    future=executor.submit(task,number)
    futures.append(future)

And then, after you’ve started all of the jobs, you can start waiting for them. There are three simple options, and one complicated one when you need more control.

(1) You can just directly loop over them to wait for them in the order they were submitted:

for future in futures:
    result = future.result()
    dostuff(result)

(2) If you need to wait for them all to be finished before doing any work, you can just call wait:

futures, _ = concurrent.futures.wait(futures)
for future in futures:
    result = future.result()
    dostuff(result)

(3) If you want to handle each one as soon as it’s ready, even if they come out of order, use as_completed:

for future in concurrent.futures.as_completed(futures): 
    dostuff(future.result())

Notice that the examples that use this function in the docs provide some way to identify which task is finished. If you need that, it can be as simple as passing each one an index, then return index, real_result, and then you can for index, result in … for the loop.

(4) If you need more control, you can loop over waiting on whatever’s done so far:

while futures:
    done, futures = concurrent.futures.wait(concurrent.futures.FIRST_COMPLETED)
    for future in done:
        result = future.result()
        dostuff(result)

That example does the same thing as as_completed, but you can write minor variations on it to do different things, like waiting for everything to be done but canceling early if anything raises an exception.

For many simple cases, you can just use the map method of the executor to simplify the first option. This works just like the builtin map function, calling a function once for each value in the argument and then giving you something you can loop over to get the results in the same order, but it does it in parallel. So:

for result in executor.map(task, couple_ods):
    dostuff(result)

123

answered Oct 21 '22 03:10

abarnert

Related questions
                            
                                First common element from two lists
                            
                                Numpy memory error creating huge matrix
                            
                                Browse for file path in python
                            
                                Fit a curve for data made up of two distinct regimes
                            
                                Flask-WTF / WTForms with Unittest fails validation, but works without Unittest
                            
                                Difference between using [] and list() in Python
                            
                                Using a websocket client as a class in python
                            
                                Django translations does not work
                            
                                Show the SQL generated by Flask-SQLAlchemy
                            
                                How to setup Atom's script to run Python 3.x scripts? May the combination with Windows 7 Pro x64 be the issue?
                            
                                Unable to install Python 3.5 within Windows XP Professional
                            
                                Pitch detection in Python
                            
                                django selenium LiveServerTestCase
                            
                                Overhead of creating classes in Python: Exact same code using class twice as slow as native DS?
                            
                                Parse human-readable filesizes into bytes
                            
                                AttributeError: 'Ui_MainWindow' object has no attribute 'setCentralWidget'
                            
                                Convert base64 String to an Image that's compatible with OpenCV
                            
                                Append tfidf to pandas dataframe
                            
                                Using PhraseMatcher in SpaCy to find multiple match types
                            
                                How to count longest uninterrupted sequence in pandas

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

store results ThreadPoolExecutor

Tags:

python

python-multithreading

concurrent.futures

Mike

People also ask

1 Answers

abarnert

Recent Activity

Donate For Us