How do we choose --nthreads
and --nprocs
per worker in Dask distributed? I have 3 workers, with 4 cores and one thread per core on 2 workers and 8 cores on 1 worker (according to the output of lscpu
Linux command on each worker).
If we start Dask using processes — as in the following code — we get 8 workers, one for each core, with each worker allotted 2 GB of memory (16 GB total / 8 workers, this will vary depending on your laptop).
Workers keep the scheduler informed of their data and use that scheduler to gather data from other workers when necessary to perform a computation. You can start a worker with the dask-worker command line application: $ dask-worker scheduler-ip:port.
Dask. distributed is a centrally managed, distributed, dynamic task scheduler. The central dask-scheduler process coordinates the actions of several dask-worker processes spread across multiple machines and the concurrent requests of several clients.
You can launch a Dask cluster using mpirun or mpiexec and the dask-mpi command line tool. This depends on the mpi4py library. It only uses MPI to start the Dask cluster and not for inter-node communication.
It depends on your workload
By default Dask creates a single process with as many threads as you have logical cores on your machine (as determined by multiprocessing.cpu_count()
).
dask-worker ... --nprocs 1 --nthreads 8 # assuming you have eight cores dask-worker ... # this is actually the default setting
Using few processes and many threads per process is good if you are doing mostly numeric workloads, such as are common in Numpy, Pandas, and Scikit-Learn code, which is not affected by Python's Global Interpreter Lock (GIL).
However, if you are spending most of your compute time manipulating Pure Python objects like strings or dictionaries then you may want to avoid GIL issues by having more processes with fewer threads each
dask-worker ... --nprocs 8 --nthreads 1
Based on benchmarking you may find that a more balanced split is better
dask-worker ... --nprocs 4 --nthreads 2
Using more processes avoids GIL issues, but adds costs due to inter-process communication. You would want to avoid many processes if your computations require a lot of inter-worker communication..
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With