[mapr@impetus-i0057 latest_code_deepak]$ dask-worker 172.26.32.37:8786
distributed.nanny - INFO - Start Nanny at: 'tcp://172.26.32.36:50930'
distributed.diskutils - WARNING - Found stale lock file and directory '/home/mapr/latest_code_deepak/dask-worker-space/worker-PwEseH', purging
distributed.worker - INFO - Start worker at: tcp://172.26.32.36:41694
distributed.worker - INFO - Listening to: tcp://172.26.32.36:41694
distributed.worker - INFO - bokeh at: 172.26.32.36:8789
distributed.worker - INFO - nanny at: 172.26.32.36:50930
distributed.worker - INFO - Waiting to connect to: tcp://172.26.32.37:8786
distributed.worker - INFO - -------------------------------------------------
distributed.worker - INFO - Threads: 8
distributed.worker - INFO - Memory: 33.52 GB
distributed.worker - INFO - Local Directory: /home/mapr/latest_code_deepak/dask-worker-spa ce/worker-AkBPtM
distributed.worker - INFO - -------------------------------------------------
distributed.worker - INFO - Registered to: tcp://172.26.32.37:8786
distributed.worker - INFO - -------------------------------------------------
what is the default directory where a dask-worker maintains the temporary files, such as task results, or the downloaded files which was uploaded using upload_file() method from the client.?
for example:-
def my_task_running_on_dask_worker():
//fetch the file from hdfs
// process the file
//store the file back into hdfs
In dask-distributed, a Worker is a Python object and node in a dask Cluster that serves two purposes, 1) serve data, and 2) perform computations. Jobs are resources submitted to, and managed by, the job queueing system (e.g. PBS, SGE, etc.). In dask-jobqueue, a single Job may include one or more Workers .
Dask. distributed stores the results of tasks in the distributed memory of the worker nodes. The central scheduler tracks all data on the cluster and determines when data should be freed. Completed results are usually cleared from memory as quickly as possible in order to make room for more computation.
If we start Dask using processes — as in the following code — we get 8 workers, one for each core, with each worker allotted 2 GB of memory (16 GB total / 8 workers, this will vary depending on your laptop).
After spinning up a Dask cluster, you can use client. dashboard_link to get a link to your dashboard. If you're using the distributed scheduler for local computation, the dashboard will be served at localhost:8787. This dashboard shows the real-time status of your cluster, resources, and computations.
By default a dask worker places a directory in ./dask-worker-space/worker-#######
where ######
is some random string for that particular worker.
You can change this location using the --local-directory
keyword to the dask-worker
executable.
The warning that you're seeing in this line
distributed.diskutils - WARNING - Found stale lock file and directory '/home/mapr/latest_code_deepak/dask-worker-space/worker-PwEseH', purging
says that a Dask worker noticed that the directory for another worker wasn't cleaned up, presumably because it failed in some hard way. This worker is cleaning up the space left behind from the previous worker.
You can see which worker creates which directory either by looking at the logs of each worker (They print out their local directory)
$ dask-worker localhost:8786
distributed.worker - INFO - Start worker at: tcp://127.0.0.1:36607
...
distributed.worker - INFO - Local Directory: /home/mrocklin/dask-worker-space/worker-ks3mljzt
Or programatically by calling client.scheduler_info()
>>> client.scheduler_info()
{'address': 'tcp://127.0.0.1:34027',
'id': 'Scheduler-bd88dfdf-e3f7-4b39-8814-beae779248f1',
'services': {'bokeh': 8787},
'type': 'Scheduler',
'workers': {'tcp://127.0.0.1:33143': {'cpu': 7.7,
...
'local_directory': '/home/mrocklin/dask-worker-space/worker-8kvk_l81',
},
...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With