TL;DR : Using Linux kernel real time with NO_HZ_FULL I need to isolate a process in order to have deterministic results but /proc/interrupts tell me there is still local timer interrupts (among other). How to disable it? Long version : I want to make sure my program is not being interrupt so I try to use a real time Linux kernel. I'm using the real time version of arch Linux (linux-rt on AUR) and I modified the configuration of the kernel to selection the following options : <pre class="prettyprint"><code>CONFIG_NO_HZ_FULL=y CONFIG_NO_HZ_FULL_ALL=y CONFIG_RCU_NOCB_CPU=y CONFIG_RCU_NOCB_CPU_ALL=y </code></pre> then I reboot my computer to boot on this real time kernel with the folowing options: <pre class="prettyprint"><code>nmi_watchdog=0 rcu_nocbs=1 nohz_full=1 isolcpus=1 </code></pre> I also disable the following option in the BIOS : <pre class="prettyprint"><code>C state intel speed step turbo mode VTx VTd hyperthreading </code></pre> My CPU (i7-6700 3.40GHz) has 4 cores (8 logical CPU with hyperthreading technology) I can see CPU0, CPU1, CPU2, CPU3 in /proc/interrupts file. CPU1 is isolated by <code>isolcpus</code> kernel parameter and I want to disable the local timer interrupts on this CPU. I though real-time kernel with CONFIG_NO_HZ_FULL and CPU isolation (isolcpus) was enough to do it and I try to check by running theses command : <pre class="prettyprint"><code>cat /proc/interrupts | grep LOC > ~/tmp/log/overload_cpu1 taskset -c 1 ./overload cat /proc/interrupts | grep LOC >> ~/tmp/log/overload_cpu1 </code></pre> where the overload process is: <pre class="prettyprint"><code>***overload.c:*** int main() { for(int i=0;i<100;++i) for(int j=0;j<100000000;++j); } </code></pre> The file <code>overload_cpu1</code> contains the result: <pre class="prettyprint"><code>LOC: 234328 488 12091 11299 Local timer interrupts LOC: 239072 651 12215 11323 Local timer interrupts </code></pre> meanings 651-488 = 163 interrupts from local timer and not 0... For comparison I do the same experiment but I change the core where my process <code>overload</code> run (I keep watching interrupts on CPU1): <pre class="prettyprint"><code>taskset -c 0 : 8 interrupts taskset -c 1 : 163 interrupts taskset -c 2 : 7 interrupts taskset -c 3 : 8 interrupts </code></pre> One of my question is why there is no 0 interrupts ? why the number of interrupts is bigger when my process run on CPU1 ? (I mean I though NO_HZ_FULL will prevent interrupt if my process was alone : "The CONFIG_NO_HZ_FULL=y Kconfig option causes the kernel to avoid sending scheduling-clock interrupts to CPUs with a single runnable task"(https://www.kernel.org/doc/Documentation/timers/NO_HZ.txt) Maybe an explaination is there is other process running on CPU1. I checked by using ps command : <pre class="prettyprint"><code>CLS CPUID RTPRIO PRI NI CMD PID TS 1 - 19 0 [cpuhp/1] 18 FF 1 99 139 - [migration/1] 20 TS 1 - 19 0 [rcuc/1] 21 FF 1 1 41 - [ktimersoftd/1] 22 TS 1 - 19 0 [ksoftirqd/1] 23 TS 1 - 19 0 [kworker/1:0] 24 TS 1 - 39 -20 [kworker/1:0H] 25 FF 1 1 41 - [posixcputmr/1] 28 TS 1 - 19 0 [kworker/1:1] 247 TS 1 - 39 -20 [kworker/1:1H] 501 </code></pre> As you can see, there is threads on the CPU1. Is that possible to disable these processes ? I guess it is because if it is not the case, NO_HZ_FULL will never work right ? Tasks with class TS doesn't disturb me because they didn't have priority among SCHED_FIFO and I can set this policy to my program. Same things for tasks with class FF and priority less than 99. However, you can see migration/1 that is in SCHED_FIFO and priority 99. Maybe these process can causes interrupts when they run . This explain the few interrupts when my process in on CPU0, CPU2 and CPU3 (respectively 8,7 and 8 interrupts) but it also mean these processes are not running very often and then doesn't explain why there is many interrupts when my process run on CPU1 (163 interrupts). I also do the same experiment but with the SCHED_FIFO of my overload process and I get: <pre class="prettyprint"><code>taskset -c 0 : 1 taskset -c 1 : 4063 taskset -c 2 : 1 taskset -c 3 : 0 </code></pre> In this configuration there is more interrupts in the case my process use SCHED_FIFO policy on CPU1 and less on other CPU. do you know why ?

The thing is that a full-tickless CPU (a.k.a. adaptive-ticks, configured with <code>nohz_full=</code>) still receives some ticks. Most notably the scheduler requires a timer on an isolated full tickless CPU for updating some state every second or so. This is a documented limitation (as of 2019): <blockquote> Some process-handling operations still require the occasional scheduling-clock tick. These operations include calculating CPU load, maintaining sched average, computing CFS entity vruntime, computing avenrun, and carrying out load balancing. They are currently accommodated by scheduling-clock tick every second or so. On-going work will eliminate the need even for these infrequent scheduling-clock ticks. </blockquote> (source: Documentation/timers/NO_HZ.txt, cf. the LWN article (Nearly) full tickless operation in 3.10 from 2013 for some background) A more accurate method to measure the local timer interrupts (LOC row in <code>/proc/interrupts</code>) is to use <code>perf</code>. For example: <pre class="prettyprint"><code>$ perf stat -a -A -e irq_vectors:local_timer_entry ./my_binary </code></pre> Where <code>my_binary</code> has threads pinned to the isolated CPUs that non-stop utilize the CPU without invoking syscalls - for - say 2 minutes. There are other sources of additional local timer ticks (when there is just 1 runnable task). For example, the collection of VM stats - by default they are collected each seconds. Thus, I can decrease my LOC interrupts by setting a higher value, e.g.: <pre class="prettyprint"><code># sysctl vm.stat_interval=60 </code></pre> Another source are periodic checks if the TSC on the different CPUs doesn't drift - you can disable those with the following kernel option: <pre class="prettyprint"><code>tsc=reliable </code></pre> (Only apply this option if you really know that your TSCs don't drift.) You might find other sources by recording traces with ftrace (while your test binary is running). Since it came up in the comments: Yes, the SMI is fully transparent to the kernel. It doesn't show up as NMI. You can only detect an SMI indirectly.

Real time Linux: disable local timer interrupts

Tags:

linux-kernel

timer

real-time

interrupt

scheduler

TL;DR : Using Linux kernel real time with NO_HZ_FULL I need to isolate a process in order to have deterministic results but /proc/interrupts tell me there is still local timer interrupts (among other). How to disable it?

Long version :

I want to make sure my program is not being interrupt so I try to use a real time Linux kernel. I'm using the real time version of arch Linux (linux-rt on AUR) and I modified the configuration of the kernel to selection the following options :

CONFIG_NO_HZ_FULL=y
CONFIG_NO_HZ_FULL_ALL=y
CONFIG_RCU_NOCB_CPU=y
CONFIG_RCU_NOCB_CPU_ALL=y

then I reboot my computer to boot on this real time kernel with the folowing options:

nmi_watchdog=0
rcu_nocbs=1
nohz_full=1
isolcpus=1

I also disable the following option in the BIOS :

C state
intel speed step
turbo mode
VTx
VTd
hyperthreading

My CPU (i7-6700 3.40GHz) has 4 cores (8 logical CPU with hyperthreading technology) I can see CPU0, CPU1, CPU2, CPU3 in /proc/interrupts file.

CPU1 is isolated by isolcpus kernel parameter and I want to disable the local timer interrupts on this CPU. I though real-time kernel with CONFIG_NO_HZ_FULL and CPU isolation (isolcpus) was enough to do it and I try to check by running theses command :

cat /proc/interrupts | grep LOC > ~/tmp/log/overload_cpu1
taskset -c 1 ./overload
cat /proc/interrupts | grep LOC >> ~/tmp/log/overload_cpu1

where the overload process is:

***overload.c:***
int main()
{
  for(int i=0;i<100;++i)
    for(int j=0;j<100000000;++j);
}

The file overload_cpu1 contains the result:

LOC:     234328        488      12091      11299   Local timer interrupts
LOC:     239072        651      12215      11323   Local timer interrupts

meanings 651-488 = 163 interrupts from local timer and not 0...

For comparison I do the same experiment but I change the core where my process overload run (I keep watching interrupts on CPU1):

taskset -c 0 :   8 interrupts
taskset -c 1 : 163 interrupts
taskset -c 2 :   7 interrupts
taskset -c 3 :   8 interrupts

One of my question is why there is no 0 interrupts ? why the number of interrupts is bigger when my process run on CPU1 ? (I mean I though NO_HZ_FULL will prevent interrupt if my process was alone : "The CONFIG_NO_HZ_FULL=y Kconfig option causes the kernel to avoid sending scheduling-clock interrupts to CPUs with a single runnable task"(https://www.kernel.org/doc/Documentation/timers/NO_HZ.txt)

Maybe an explaination is there is other process running on CPU1. I checked by using ps command :

CLS CPUID RTPRIO PRI  NI CMD                           PID
TS      1      -  19   0 [cpuhp/1]                      18
FF      1     99 139   - [migration/1]                  20
TS      1      -  19   0 [rcuc/1]                       21
FF      1      1  41   - [ktimersoftd/1]                22
TS      1      -  19   0 [ksoftirqd/1]                  23
TS      1      -  19   0 [kworker/1:0]                  24
TS      1      -  39 -20 [kworker/1:0H]                 25
FF      1      1  41   - [posixcputmr/1]                28
TS      1      -  19   0 [kworker/1:1]                 247
TS      1      -  39 -20 [kworker/1:1H]                501

As you can see, there is threads on the CPU1. Is that possible to disable these processes ? I guess it is because if it is not the case, NO_HZ_FULL will never work right ?

Tasks with class TS doesn't disturb me because they didn't have priority among SCHED_FIFO and I can set this policy to my program. Same things for tasks with class FF and priority less than 99.

However, you can see migration/1 that is in SCHED_FIFO and priority 99. Maybe these process can causes interrupts when they run . This explain the few interrupts when my process in on CPU0, CPU2 and CPU3 (respectively 8,7 and 8 interrupts) but it also mean these processes are not running very often and then doesn't explain why there is many interrupts when my process run on CPU1 (163 interrupts).

I also do the same experiment but with the SCHED_FIFO of my overload process and I get:

taskset -c 0 : 1
taskset -c 1 : 4063
taskset -c 2 : 1
taskset -c 3 : 0

In this configuration there is more interrupts in the case my process use SCHED_FIFO policy on CPU1 and less on other CPU. do you know why ?

620

asked Sep 06 '17 14:09

sebastien dontneedtoknowthat

1 Answers

The thing is that a full-tickless CPU (a.k.a. adaptive-ticks, configured with nohz_full=) still receives some ticks.

Most notably the scheduler requires a timer on an isolated full tickless CPU for updating some state every second or so.

This is a documented limitation (as of 2019):

Some process-handling operations still require the occasional scheduling-clock tick. These operations include calculating CPU load, maintaining sched average, computing CFS entity vruntime, computing avenrun, and carrying out load balancing. They are currently accommodated by scheduling-clock tick every second or so. On-going work will eliminate the need even for these infrequent scheduling-clock ticks.

(source: Documentation/timers/NO_HZ.txt, cf. the LWN article (Nearly) full tickless operation in 3.10 from 2013 for some background)

A more accurate method to measure the local timer interrupts (LOC row in /proc/interrupts) is to use perf. For example:

$ perf stat -a -A -e irq_vectors:local_timer_entry ./my_binary

Where my_binary has threads pinned to the isolated CPUs that non-stop utilize the CPU without invoking syscalls - for - say 2 minutes.

There are other sources of additional local timer ticks (when there is just 1 runnable task).

For example, the collection of VM stats - by default they are collected each seconds. Thus, I can decrease my LOC interrupts by setting a higher value, e.g.:

# sysctl vm.stat_interval=60

Another source are periodic checks if the TSC on the different CPUs doesn't drift - you can disable those with the following kernel option:

tsc=reliable

(Only apply this option if you really know that your TSCs don't drift.)

You might find other sources by recording traces with ftrace (while your test binary is running).

Since it came up in the comments: Yes, the SMI is fully transparent to the kernel. It doesn't show up as NMI. You can only detect an SMI indirectly.

108

answered Sep 18 '22 13:09

maxschlepzig

Related questions
                            
                                ScheduledExecutorService one thread many tasks
                            
                                Swift countdown timer- displays days hours seconds remaining
                            
                                C# Timer With Lambda Instead of Method Reference?
                            
                                How to get time elapsed in milliseconds
                            
                                How can I make a 10 second countdown timer before a download button link appears?
                            
                                Wait for Keypress (or) N Seconds to Expire
                            
                                something like cron (timer) in gevent
                            
                                How to create a circular progress indicator for a count down timer
                            
                                Using STM32 HAL Timer and Adjusting the Duty Cycle of a PWM signal
                            
                                timers in linux in c [duplicate]
                            
                                Pass parameters to Timer Task (Java)
                            
                                How to redirect to another page after 5 minutes?
                            
                                How to create an infinite loop
                            
                                Game programming and quantity of timers
                            
                                Remote polling in ManagedBean and notify client-view via push
                            
                                Python - Accurate time.sleep
                            
                                Changing Android system clock stops timers. How can I restart them?
                            
                                C# Timers and Garbage Collection
                            
                                Is clock_nanosleep affected by adjtime and NTP?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With