Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Are these two python codes equivalent using concurrent futures?

I have two pieces of code, representative of a more complex scenario I am trying to debug. I am wondering if they are technically equivalent, and if not, why.

First one:

import time                                                                               
from concurrent.futures import ThreadPoolExecutor                                         

def cb(res):
    print("done", res)

def foo():
    time.sleep(3)
    res = 5
    cb(res)
    return res

with ThreadPoolExecutor(max_workers=2) as executor:
    future = executor.submit(foo)
    print(future.result())

Second one:

def cb2(fut):
    print("done", fut.result())

def foo2():
    time.sleep(3)
    return 5

with ThreadPoolExecutor(max_workers=2) as executor:
    future = executor.submit(foo2)
    future.add_done_callback(cb2)
    print(future.result())

The core of the issue is the following: I need to call a sync, slow operation (here, represented by the sleep). When that operation completes, I have to perform subsequent fast operations. In the first code, I put these operations immediately after the sync slow one. In the second code, I put it in the callback.

In terms of implementation, I suspect the future creates a secondary thread, runs the code in the secondary thread, and this secondary thread will stop at the sync slow operation. Once this operation is completed, the secondary thread will keep going, and it can keep going either by executing the subsequent code or by calling the callbacks. I see no difference in these two pieces of code (apart from the fact that adding the callback allows injecting code from outside, an added flexibility), but I might be wrong, hence the question.

Note that I do understand that in the first case, the print is called when the future is still not resolved and in the second one it is, but it is assumed that the status is not relevant.

like image 921
Stefano Borini Avatar asked Nov 07 '22 14:11

Stefano Borini


1 Answers

These two examples are not equal in terms of events ordering. Let’s look through the lifecycle of a Future. It is roughly like that (reverse engineered from cpython’s source):

  • a Future is created
  • it is added to executor’s queue
  • it is popped from the queue by some free/idle thread from the threadpool
  • the function provided to submit() is called in that thread
  • the future is marked as FINISHED
  • the future broadcasts the ‘state changed’ event to all its waiters
  • callbacks are invoked (still in the same worker thread)
  • the worker thread becomes free/idle and may take another future from the queue

When you execute the statement print(future.result()), your main thread blocks and becomes the future’s waiter. It becomes unblocked right after the future switches to FINISHED, but right before callbacks start to execute. That means that you cannot predict what print goes first in the console - print in any of your callbacks, or print(future(result)) - they now are executing in parallel. If you deal with same data in your callbacks and in the main thread after waiting for future.result() to complete, you are likely to get data corruption.

like image 147
Andrii Maletskyi Avatar answered Nov 14 '22 22:11

Andrii Maletskyi