Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I get my Luigi scheduler to utilize multiple cores with the parallel-scheduling flag?

I have the following line in my luigi.cfg file (on all nodes, scheduler and workers):

[core]
parallel-scheduling: true

However, when I monitor CPU utilization on my luigi scheduler (with a graph of around ~4000 tasks, handling requests from ~100 workers), it is only utilizing a single core on the scheduler, with the single luigid thread often hitting 100% CPU utilization. My understanding is that this configuration variable should parallelize scheduling of tasks.

The source suggests that this flag should indeed use multiple cores on the scheduler. In https://github.com/spotify/luigi/blob/master/luigi/interface.py#L194, a call is made to https://github.com/spotify/luigi/blob/master/luigi/worker.py#L498 to check the .complete() state of the task in parallel.

What am I missing to get my Luigi scheduler to utilize all of its cores?

like image 311
captaincapsaicin Avatar asked Mar 25 '16 06:03

captaincapsaicin


1 Answers

I just realize the name parallel-scheduling is a bit confusing. It does not affect the scheduler. Only the workers. Workers will perform the scheduling phase in parallel when that option is set.

As of today there is no way to utilize multiple cores for the central scheduler.

like image 69
Tarrasch Avatar answered Oct 18 '22 17:10

Tarrasch