I'm running a process (on a Linux 3.x-based OS) in which:
It's possible that there's oversubscription (i.e. more workers threads than twice the cores on an Intel processor with HT). Now, what I'm seeing is that the 'manager' threads don't get processor time frequently enough. They're not entirely 'starved', I just want to give them a boost. So, naturally I thought about setting different thread prioritization (I'm on Linux) - but then I noticed the different choices for thread schedulers and their effect. At this point I got confused, or rather - it's not clear to me:
Notes:
UPD 12.02.2015: I have run some experiments.
There is obvious solution to change "manager" threads scheduler to RT (real-time scheduler that provides SCHED_DEADLINE/SCHED_FIFO policies). In this case "manager" threads will always have larger priority than most threads in a system, so they will almost always get CPU when they need it.
However, there is another solution that allows you to stay on CFS scheduler. Your description of purpose of "worker" threads is similiar to batch scheduling (in ancient times when computers were large, user has to put his job onto queue and wait hours till its done). Linux CFS supports batch jobs via SCHED_BATCH policy and dialog jobs via SCHED_NORMAL policy.
There is also useful comment in kernel code (kernel/sched/fair.c):
/*
* Batch and idle tasks do not preempt non-idle tasks (their preemption
* is driven by the tick):
*/
if (unlikely(p->policy != SCHED_NORMAL) || !sched_feat(WAKEUP_PREEMPTION))
return;
So when "manager" thread or some other event awake "worker", latter will get CPU only if there are free CPUs in system or when "manager" will exhaust its timeslice (to tune it change the weight of task).
It seems that your problem couldn't be solved without changing of scheduler policies. If "worker" threads are very busy and "manager" are rarely wake up, they would get same vruntime
bonus, so "worker" would always preempt "manager" threads (but you may increase their weight, so they would exhaust their bonus faster).
I have a server with 2 x Intel Xeon E5-2420 CPUs which gives us 24 hardware threads. To simulate two threadpools I used my own TSLoad workload generator (and fixed couple of bugs while running experiments :) ).
There were two threadpools: tp_manager
with 4 threads and tp_worker
with 30 threads, both running busy_wait
workloads (just for(i = 0; i < N; ++i);
) but with different number of loop cycles. tp_worker
works in benchmark
mode so it will run as many requests as it can and occupies 100% of CPU.
Here are sample config: https://gist.github.com/myaut/ad946e89cb56b0d4acde
3.12 (vanilla with debug config)
EXP | MANAGER | WORKER
| sched wait service | sched service
| policy time time | policy time
33 | NORMAL 0.045 2.620 | WAS NOT RUNNING
34 | NORMAL 0.131 4.007 | NORMAL 125.192
35 | NORMAL 0.123 4.007 | BATCH 125.143
36 | NORMAL 0.026 4.007 | BATCH (nice=10) 125.296
37 | NORMAL 0.025 3.978 | BATCH (nice=19) 125.223
38 | FIFO (prio=9) -0.022 3.991 | NORMAL 125.187
39 | core:0:0 0.037 2.929 | !core:0:0 136.719
3.2 (stock Debian)
EXP | MANAGER | WORKER
| sched wait service | sched service
| policy time time | policy time
46 | NORMAL 0.032 2.589 | WAS NOT RUNNING
45 | NORMAL 0.081 4.001 | NORMAL 125.140
47 | NORMAL 0.048 3.998 | BATCH 125.205
50 | NORMAL 0.023 3.994 | BATCH (nice=10) 125.202
48 | NORMAL 0.033 3.996 | BATCH (nice=19) 125.223
42 | FIFO (prio=9) -0.008 4.016 | NORMAL 125.110
39 | core:0:0 0.035 2.930 | !core:0:0 135.990
Some notes:
nice
can do it indirectly) slightly reduces wait time.In addition to myaut's answer, you could also bind the manager to specific CPUs (sched_setaffinity) and the workers to the rest. Depending on your exact use case that can be very wasteful, of course.
Link: Thread binding the CPU core
Explicit yielding is generally not necessary, in fact often discouraged. To quote Robert Love in "Linux System Programming":
In practice, there are few legitimate uses of sched_yield() on a proper preemptive multitasking system such as Linux. The kernel is fully capable of making the optimal and most efficient scheduling decisions - certainly, the kernel is better equipped than an individual application to decide what to preempt and when.
The exception that he mentions, is when you are waiting on external events, for example, caused by the user, hardware or by another process. That is not the case, in your example.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With