Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

distributed.worker Memory use is high but worker has no data to store to disk

distributed.worker - WARNING - Memory use is high but worker has no data to store to disk.  Perhaps some other process is leaking memory?  Process memory: 3.91 GB -- Worker memory limit: 2.00 GB
distributed.worker - WARNING - Worker is at 41% memory usage. Resuming worker. Process memory: 825.12 MB -- Worker memory limit: 2.00 GB

The above error appears when I try to run a piece of code that applies an algorithm to a dataset that I have. Having read through the documentation at https://distributed.dask.org/en/latest/worker.html, it's still not clear to me what the impact of this error will be on the results of this application. Does this just affect the speed or efficiency of this code, or will it impact my results?

like image 682
AHassett Avatar asked Feb 11 '20 16:02

AHassett


People also ask

How does DASK manage memory?

Dask. distributed stores the results of tasks in the distributed memory of the worker nodes. The central scheduler tracks all data on the cluster and determines when data should be freed. Completed results are usually cleared from memory as quickly as possible in order to make room for more computation.

Can DASK run out of memory?

TL;DR: unmanaged memory is RAM that the Dask scheduler is not directly aware of and which can cause workers to run out of memory and cause computations to hang and crash.

What are workers in DASK?

Worker node in a Dask distributed cluster. Workers perform two functions: Serve data from a local dictionary. Perform computation on that data and on data from peers.

What is DASK distributed?

Dask. distributed is a centrally managed, distributed, dynamic task scheduler. The central dask scheduler process coordinates the actions of several dask worker processes spread across multiple machines and the concurrent requests of several clients.


1 Answers

That warning is saying that your process is taking up much more memory than you are saying is OK. In this situation Dask may pause execution or even start restarting your workers.

The warning also says that Dask itself isn't holding on to any data, so there isn't much that it can do to help the situation (like remove its data). My guess is that some of the libraries that you are using taking up a lot of memory. You might want to use Dask workers that have more than 2GB of memory.

like image 141
MRocklin Avatar answered Sep 22 '22 17:09

MRocklin