Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Semaphores in dask.distributed?

I have a dask cluster with n workers and want the workers to do queries to the database. But the database is only capable of handling m queries in parallel where m < n. How can I model that in dask.distributed? Only m workers should work on such a task in parallel.

I have seen that distributed supports locks (http://distributed.readthedocs.io/en/latest/api.html#distributed.Lock). But with that, I could do only one query in parallel, not m.

Also I have seen that I could define resources per worker (https://distributed.readthedocs.io/en/latest/resources.html). But that does not fit also, as the database is independent from the workers. I would either have to define 1 database resource per worker (which leads to too much parallel queries). Or I would have to distribute m database resources to n workers, which is difficult on setting up the cluster and suboptimal in execution.

Is it possible to define something like semaphores in dask to solve that?

like image 206
Christian Trebing Avatar asked Feb 07 '18 15:02

Christian Trebing


1 Answers

You could probably hack something together with Locks and Variables.

A cleaner solution would be to just implement Semaphores much like how Locks are implemented. Depending on your experience this may not be that hard, (the lock implementation is 150 lines) and would be a welcome pull request.

https://github.com/dask/distributed/blob/master/distributed/lock.py

like image 103
MRocklin Avatar answered Oct 02 '22 15:10

MRocklin