Semaphores in dask.distributed?

Question

I have a dask cluster with n workers and want the workers to do queries to the database. But the database is only capable of handling m queries in parallel where m < n. How can I model that in dask.distributed? Only m workers should work on such a task in parallel.

I have seen that distributed supports locks (http://distributed.readthedocs.io/en/latest/api.html#distributed.Lock). But with that, I could do only one query in parallel, not m.

Also I have seen that I could define resources per worker (https://distributed.readthedocs.io/en/latest/resources.html). But that does not fit also, as the database is independent from the workers. I would either have to define 1 database resource per worker (which leads to too much parallel queries). Or I would have to distribute m database resources to n workers, which is difficult on setting up the cluster and suboptimal in execution.

Is it possible to define something like semaphores in dask to solve that?

MRocklin · Accepted Answer

You could probably hack something together with Locks and Variables.

A cleaner solution would be to just implement Semaphores much like how Locks are implemented. Depending on your experience this may not be that hard, (the lock implementation is 150 lines) and would be a welcome pull request.

https://github.com/dask/distributed/blob/master/distributed/lock.py

Semaphores in dask.distributed?

Tags:

dask

dask-distributed

Christian Trebing

1 Answers

MRocklin

Recent Activity

Donate For Us

Semaphores in dask.distributed?

Tags:

dask

dask-distributed

Christian Trebing

1 Answers

MRocklin

Related questions

Recent Activity

Donate For Us