Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

dask: specify number of processes

Tags:

python

dask

I am trying to use dask to do some embarassingly parallel processing. For some reaason I have to use dask but the task could be easily achieved using multiprocessing.Pool(5).map.

For example:

import dask
from dask import compute, delayed

def do_something(x): return x * x

data = range(10)
delayed_values = [delayed(do_something)(x) for x in data]
results = compute(*delayed_values, scheduler='processes')

It works, but apparently it uses only one process.

How can I configure dask so it uses a pool of 5 processes for this computation?

like image 698
piokuc Avatar asked Jul 11 '18 12:07

piokuc


Video Answer


2 Answers

You can use the num_workers parameter to specify the number of processes for the compute method.

results = compute(*delayed_values, scheduler='processes', num_workers=5)
like image 109
Scratch'N'Purr Avatar answered Oct 18 '22 19:10

Scratch'N'Purr


you can configure it to use a custom process pool as such:

import dask
from multiprocessing.pool import Pool

dask.config.set(pool=Pool(5))

or as a context manager:

with dask.config.set(scheduler='processes', num_workers=5):
    ...

you may want to read this dask_scheduling

or my previous answer

like image 40
moshevi Avatar answered Oct 18 '22 20:10

moshevi