Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Linux HZ and fair schedule timeslice

In sched_fair.c it has:

unsigned int sysctl_sched_latency = 5000000ULL //5m 

unsigned int sysctl_sched_min_granularity = 1000000ULL //1ms

I understand that Linux fair timeslice varies depending on the nr_running and the relative weight of this fair task, but through code studying, I figured out the main idea is to keep the timeslice 1 to 5 ms. Please correct me if I understand it wrong. I must be wrong here but I just cannot figure out how!

Also knowing that HZ, or the number of system ticks per s, or the number of timer interrupts every second, is normally 200 or 100 for arm machine (and most non-desktop machines too), which gives us a 5 to 10 ms tick rate.

Timeslice is put in action by starting rq->hrtick_timer in set_next_entity(), every time a fair task is scheduled to run, and invoking resched_task() in timeout callback function hrtick(). This timer is simply one of the queued timers that are processed by timer irq handler on every tick, timer_tick()...run_local_timer(). There seems no other hidden secret.

Then how we can get a timeslice shorter than 5 ms? Please help me understand this. Thank you very much!

like image 372
HelloYou Avatar asked Nov 03 '22 16:11

HelloYou


1 Answers

As stated in Robert Love's Linux Kernel Development, the only ways to get a timeslice shorter is to increase number of running processes (or ones with less priority than others).

Increasing number on running process creates a need for shorten timeslice to guarantee appropriate target latency (but timeslice is lower bounded with minimum granularity). But there is no guarantee that process will be preempted in given timeslice. That's because time accounting is driven by timer interrupts.

Increasing value of HZ makes timer interrupts happen more frequently that makes time accounting more precious, so rescheduling may occur more frequently.


The vruntime variable stores the virtual runtime of a process, which is the actual runtime normalized by the number of runnable processes. On the ideal multitasking system vruntime of all the process would be identical—all tasks would have received an equal, fair share of the processor.

Typically timeslice is target latency divided by number of running processes. But when number of running processes approaches infinity, timeslice approaches 0. As this will eventually result in unacceptable switching costs, CFS imposes a floor on the timeslice assigned to each process.This floor is called the minimum granularity. So timeslice is value between sysctl_sched_latency and sysctl_sched_granularity. (See sched_timeslice())

vruntime variable is managed by update_curr(). update_curr() is invoked periodically by the system timer and also whenever a process becomes runnable or blocks, becoming unrunnable.

To drive preemption between tasks, hrtick() calls task_tick_fair() on each timer interrupt, which, in turn, calls entity_tick(). entity_tick() calls update_curr() to update process vruntime and then calls check_preempt_tick(). check_preempt_tick() checks whether current runtime is greater than ideal runtime (timeslice), if so, calls resched_task(), which sets TIF_NEED_RESCHED flag.

When TIF_NEED_RESCHED is set, schedule() gets called on the nearest possible occasion.

So, with increasing value of HZ, timer interrupts happens more frequently causing more precious time accounting and allowing scheduler to reschedule tasks more frequently.

like image 160
Alexey Shmalko Avatar answered Nov 09 '22 13:11

Alexey Shmalko