How to set up logging on dask distributed workers?

Tags:

dask

After upgrading of dask distributed to version 1.15.0 my logging stopped working.

I've used logging.config.dictConfig to initialize python logging facilities, and previously these settings propagated to all workers. But after upgrade it doesn't work anymore.

If I do dictConfig right before every log call on every worker it works but it's not a proper solution.

So the question is how it initialize logging on every worker before my computation graph starts executing and do it only once per worker?

UPDATE:

This hack worked on a dummy example but didn't make a difference on my system:

def init_logging():
   # logging initializing happens here
   ...

client = distributed.Client()
client.map(lambda _: init_logging, client.ncores())

UPDATE 2:

After digging through documentation this fixed the problem:

client.run(init_logging)

So the question now is: Is this a proper way to solve this problem?

801

asked Jan 05 '17 00:01

1 Answers

As of version 1.15.0 we now fork workers from a clean process, so changes that you make to your process prior to calling Client() won't affect forked workers. For more information search for forkserver here: https://docs.python.org/3/library/multiprocessing.html#contexts-and-start-methods

Your solution of using Client.run looks good to me. Client.run is currently (as of version 1.15.0) the best way to call a function on all currently active workers.

Distributed Systems

It is worth noting that here you're setting up clients forked from the same process on a single computer. The trick you use above will not work in a distributed setting. In case people come to this question asking about how to handle logging with Dask in a cluster context I'm adding this note.

Generally Dask does not move logs around. Instead, it is common that whatever mechanism you used to launch Dask handles this. Job schedulers like SGE/SLURM/Torque/PBS all do this. Cloud systems like YARN/Mesos/Marathon/Kubernetes all do this. The dask-ssh tool does this.

178

answered Sep 30 '22 16:09

MRocklin

Related questions
                            
                                Writing xarray multiindex data in chunks
                            
                                How to concat multiple pandas dataframes into one dask dataframe larger than memory?
                            
                                Create an if-else condition column in dask dataframe
                            
                                Understanding memory behavior of Dask distributed
                            
                                how to throttle a large number of tasks without using all workers
                            
                                How to read parquet file from s3 using dask with specific AWS profile
                            
                                Dask Dataframe: Get row count?
                            
                                Assign (add) a new column to a dask dataframe based on values of 2 existing columns - involves a conditional statement
                            
                                How do I run a dask.distributed cluster in a single thread?
                            
                                Dask read_csv-- Mismatched dtypes found in `pd.read_csv`/`pd.read_table`
                            
                                How to use all the cpu cores using Dask?
                            
                                How to force parquet dtypes when saving pd.DataFrame?
                            
                                Dask dataframe split partitions based on a column or function
                            
                                Comparison between Modin | Dask | Data.table | Pandas for parallel processing and out of memory csv files
                            
                                Parallelizing loading data from MongoDB into python
                            
                                What is the "right" way to close a Dask LocalCluster?
                            
                                Simple way to Dask concatenate (horizontal, axis=1, columns)
                            
                                Slow len function on dask distributed dataframe
                            
                                Error with OMP_NUM_THREADS when using dask distributed
                            
                                Why is dask read_csv from s3 keeping so much memory?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With