It is unclear how to properly timeout workers of joblib's Parallel
in python. Others have had similar questions here, here, here and here.
In my example I am utilizing a pool of 50 joblib
workers with threading
backend.
Parallel Call (threading):
output = Parallel(n_jobs=50, backend = 'threading')
(delayed(get_output)(INPUT)
for INPUT in list)
Here, Parallel
hangs without errors as soon as len(list) <= n_jobs
but only when n_jobs => -1
.
In order to circumvent this issue, people give instructions on how to create a timeout decorator to the Parallel
function (get_output(INPUT)
) in the above example) using multiprocessing
:
Main function (decorated):
@with_timeout(10) # multiprocessing
def get_output(INPUT): # threading
output = do_stuff(INPUT)
return output
Multiprocessing Decorator:
def with_timeout(timeout):
def decorator(decorated):
@functools.wraps(decorated)
def inner(*args, **kwargs):
pool = multiprocessing.pool.ThreadPool(1)
async_result = pool.apply_async(decorated, args, kwargs)
try:
return async_result.get(timeout)
except multiprocessing.TimeoutError:
return
return inner
return decorator
Adding the decorator to the otherwise working code results in a memory leak after ~2x the length of the timeout plus a crash of eclipse.
Where is this leak in the decorator?
How to timeout threads during multiprocessing in python?
It is not possible to kill a Thread in Python without a hack.
The memory leak you are experiencing is due to the accumulation of threads you believe they have been killed. To prove that, just try to inspect the amount of threads your application is running, you will see them slowly growing.
Under the hood, the thread of the ThreadPool
is not terminated but keeps running your function until the end.
The reason why a Thread cannot be killed, is due to the fact that threads share memory with the parent process. Therefore, it is very hard to kill a thread while ensuring the memory integrity of your application.
Java developers figured it out long ago.
If you can run your function in a separate process, then you could easily rely on a timeout logic where the process itself is killed once the timeout is reached.
The Pebble
library already offers decorators with timeout.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With