Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

keep getting "distributed.utils_perf - WARNING - full garbage collections took 19% CPU time..."

I keep getting "distributed.utils_perf - WARNING - full garbage collections took 19% CPU time recently" warning message after I finished DASK code. I am using DASK doing a large seismic data computing. After the computing, I will write the computed data into disk. The writing to disk part takes much longer than computing. Before I wrote the data to the disk, I call client.close(), which I assume that I am done with DASK. But "distributed.utils_perf - WARNING - full garbage collections took 19% CPU time recently" keep coming. When I doing the computing, I got this warning message 3-4 times. But when I write the data to the disk, I got the warning every 1 sec. How can I get ride of this annoying warning? Thanks.

like image 281
NSJ Avatar asked Oct 18 '19 23:10

NSJ


2 Answers

I had been struggling with this warning too. I would get many of these warnings and then the workers would die. I was getting them because I had some custom python functions for aggregating my data together that was handling large python objects (dict). It makes sense so much time was being spent of garbage collection if I was creating these large objects.

I refactored my code so more computation was being done in parallel before they were aggregated together and the warnings went.

I looked at the progress chart on the status page of dask dashboard to see which tasks were taking along time to process (Dask tries to name the tasks after the function in your code which called them so that can help, but they're not always that descriptive). From there I could figure out which part of my code I needed to optimise.

like image 128
Alexander Townsend Avatar answered Oct 07 '22 00:10

Alexander Townsend


same was happening with me in the Colab where we start the session client = Client(n_workers = 40, threads_per_worker = 2 )

I terminate all my Colab sessions and installed and imported all the Dask libs

!pip install dask
!pip install cloudpickle
!pip install 'dask[dataframe]'
!pip install 'dask[complete]'

from dask.distributed import Client
import dask.dataframe as dd
import dask.multiprocessing

Now everything is working fine and not facing any issues. Don't know how this solved my issue :D

like image 27
Rajdeep Borgohain Avatar answered Oct 07 '22 00:10

Rajdeep Borgohain