I have the following line in my luigi.cfg
file (on all nodes, scheduler and workers):
[core]
parallel-scheduling: true
However, when I monitor CPU utilization on my luigi scheduler (with a graph of around ~4000 tasks, handling requests from ~100 workers), it is only utilizing a single core on the scheduler, with the single luigid
thread often hitting 100% CPU utilization. My understanding is that this configuration variable should parallelize scheduling of tasks.
The source suggests that this flag should indeed use multiple cores on the scheduler. In https://github.com/spotify/luigi/blob/master/luigi/interface.py#L194, a call is made to https://github.com/spotify/luigi/blob/master/luigi/worker.py#L498 to check the .complete()
state of the task in parallel.
What am I missing to get my Luigi scheduler to utilize all of its cores?
I just realize the name parallel-scheduling
is a bit confusing. It does not affect the scheduler. Only the workers. Workers will perform the scheduling phase in parallel when that option is set.
As of today there is no way to utilize multiple cores for the central scheduler.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With