Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

combining tqdm with delayed execution with dask in python

Tags:

python

tqdm

dask

tqdm and dask are both amazing packages for iterations in python. While tqdm implements the needed progress bar, dask implements the multi-thread platform and they both can make iteration process less frustrating. Yet - I'm having troubles to combine them both together.

For example, the following code implements a delayed execution in dask, with tqdm.trange progress bar. The thing is that since the delayed is performed quickly, the progress bar ends immediately, while the real computation-time effort is done during the compute part.

from dask import delayed,compute
from tqdm import trange
from time import sleep

ct = time()
result= []

def fun(x):
    sleep(x)
    return x

for i in trange(10):
    result.append(delayed(fun)(i))

print compute(result)

How can I attach the progress bar to the actual execution in compute command?

like image 478
Dimgold Avatar asked Jun 11 '17 12:06

Dimgold


People also ask

Does tqdm work with while loops?

4. Working with a while loop and unknown increments. Instead of using tqdm as a wrapper, we can create it outside the loop and update it inside the loop on each iteration. This makes tqdm more flexible for loops with unknown length or unknown increments.

How does DASK delayed work?

The Dask delayed function decorates your functions so that they operate lazily. Rather than executing your function immediately, it will defer execution, placing the function and its arguments into a task graph. Wraps a function or object to produce a Delayed .

Does tqdm affect performance?

In addition, a huge benefit of using tqdm instead of a different method for showing a progress bar is that tqdm has little overhead, around 60 nanoseconds per iteration — meaning it should not affect performance much, compared to something like ProgressBar, which has an overhead of 800 nanoseconds per iteration.

Does tqdm work with zip?

tqdm can be used with zip if a total keyword argument is provided in the tqdm call. The issue is that tqdm needs to know the length of the iterable ahead of time. Because zip is meant to handle iterables with different lengths, it does not have as an attribute a single length of its arguments.


2 Answers

Consider Dask's progress bar

from dask.diagnostics import ProgressBar

with ProgressBar():
    compute(result)

Build a diagnostic of your own

You can use this plugin architecture to get a signal at the end of every task. http://dask.pydata.org/en/latest/diagnostics.html

Here is an example of someone doing exactly this: https://github.com/tqdm/tqdm/issues/278

like image 101
MRocklin Avatar answered Sep 19 '22 19:09

MRocklin


Based on :

Dask Integration

from tqdm.dask import TqdmCallback

with TqdmCallback(desc="compute"):
    ...
    arr.compute()

# or use callback globally
cb = TqdmCallback(desc="global")
cb.register()
arr.compute()

Applied to the code in the question:

from dask import delayed,compute
from tqdm.auto import tqdm
# from tqdm import trange
from time import sleep

from tqdm.dask import TqdmCallback

# ct = time()
result= []

def fun(x):
    sleep(x)
    return x

for i in tqdm(range(10)):
    result.append(delayed(fun)(i))

with TqdmCallback(desc="compute"):
    print(compute(result))

screenshot of output in jupyter : enter image description here

like image 39
eldad-a Avatar answered Sep 17 '22 19:09

eldad-a