Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does ThreadPoolExecutor().map differ from ThreadPoolExecutor().submit?

I was just very confused by some code that I wrote. I was surprised to discover that:

with concurrent.futures.ThreadPoolExecutor(max_workers=4) as executor:     results = list(executor.map(f, iterable)) 

and

with concurrent.futures.ThreadPoolExecutor(max_workers=4) as executor:     results = list(map(lambda x: executor.submit(f, x), iterable)) 

produce different results. The first one produces a list of whatever type f returns, the second produces a list of concurrent.futures.Future objects that then need to be evaluated with their result() method in order to get the value that f returned.

My main concern is that this means that executor.map can't take advantage of concurrent.futures.as_completed, which seems like an extremely convenient way to evaluate the results of some long-running calls to a database that I'm making as they become available.

I'm not at all clear on how concurrent.futures.ThreadPoolExecutor objects work -- naively, I would prefer the (somewhat more verbose):

with concurrent.futures.ThreadPoolExecutor(max_workers=4) as executor:     result_futures = list(map(lambda x: executor.submit(f, x), iterable))     results = [f.result() for f in futures.as_completed(result_futures)] 

over the more concise executor.map in order to take advantage of a possible gain in performance. Am I wrong to do so?

like image 566
Patrick Collins Avatar asked Dec 30 '13 11:12

Patrick Collins


People also ask

What does ThreadPoolExecutor map return?

Like the built-in map() function, the ThreadPoolExecutor map() function returns an iterable over the results returned by the target function applied to the provided iterable of items.

How does ThreadPoolExecutor work in Python?

ThreadPoolExecutor. ThreadPoolExecutor is an Executor subclass that uses a pool of threads to execute calls asynchronously. An Executor subclass that uses a pool of at most max_workers threads to execute calls asynchronously.

Why do we use ThreadPoolExecutor?

Use the ThreadPoolExecutor class when you need to be able to check on the status of tasks during their execution. Use the ThreadPoolExecutor class when you need to take action based on the results of tasks, such as the first task to complete, the first task to raise an exception, or results as they become available.

Is ThreadPoolExecutor thread safe python?

ThreadPoolExecutor Thread-Safety Although the ThreadPoolExecutor uses threads internally, you do not need to work with threads directly in order to execute tasks and get results. Nevertheless, when accessing resources or critical sections, thread-safety may be a concern.


1 Answers

The problem is that you transform the result of ThreadPoolExecutor.map to a list. If you don't do this and instead iterate over the resulting generator directly, the results are still yielded in the original order but the loop continues before all results are ready. You can test this with this example:

import time import concurrent.futures  e = concurrent.futures.ThreadPoolExecutor(4) s = range(10) for i in e.map(time.sleep, s):     print(i) 

The reason for the order being kept may be because it's sometimes important that you get results in the same order you give them to map. And results are probably not wrapped in future objects because in some situations it may take just too long to do another map over the list to get all results if you need them. And after all in most cases it's very likely that the next value is ready before the loop processed the first value. This is demonstrated in this example:

import concurrent.futures  executor = concurrent.futures.ThreadPoolExecutor() # Or ProcessPoolExecutor data = some_huge_list() results = executor.map(crunch_number, data) finals = []  for value in results:     finals.append(do_some_stuff(value)) 

In this example it may be likely that do_some_stuff takes longer than crunch_number and if this is really the case it's really not a big loss of performance while you still keep the easy usage of map.

Also since the worker threads(/processes) start processing at the beginning of the list and work their way to the end to the list you submitted the results should be finished in the order they're already yielded by the iterator. Which means in most cases executor.map is just fine, but in some cases, for example if it doesn't matter in which order you process the values and the function you passed to map takes very different times to run, the future.as_completed may be faster.

like image 182
Kritzefitz Avatar answered Oct 05 '22 00:10

Kritzefitz