I am experiencing a lot of CPU throttling on my k8s cluster -- do I have the high-throttling low-quota Linux kernel bug?

Question

I am experiencing a lot of CPU throttling (see nginx graph below, other pods often 25% to 50%) in my Kubernetes cluster (k8s v1.18.12, running 4.9.0-11-amd64 #1 SMP Debian 4.9.189-3+deb9u2 (2019-11-11) x86_64 GNU/Linux).

Due to backports, I do not know whether my cluster contains the Linux kernel bug described in https://lkml.org/lkml/2019/5/17/581. How can I find out? Is there a simple way to check or measure?

If I have the bug, what is the best approach to get the fix? Or should I mitigate otherwise, e.g. not use CFS quota (--cpu-cfs-quota=false or no CPU limits) or reduce cfs_period_us and cfs_quota_us?

CPU Throttling Percentage for nginx (scaling horizontally around 15:00 and removing CPU limits around 19:30): enter image description here

Free Yang · Accepted Answer

Recently, I'm working on debuging the cpu throttling issue, with the following 5 tests, I've tested out the bug in kernel (Linux version 4.18.0-041800rc4-generic)

This test case is intended to hit 100% throttling for the test 5000ms / 100 ms periods = 50 periods. A kernel without this bug should be able to have a CPU usage stats about 500ms.

Maybe you can try these tests to check whether your kernel will be throttlled.

[Multi Thread Test 1]

./runfibtest 1; ./runfibtest

From <https://github.com/indeedeng/fibtest>

[Result]

Throttled

./runfibtest 1
Iterations Completed(M): 465 
Throttled for: 52 
CPU Usage (msecs) = 508

./runfibtest 8
Iterations Completed(M): 283 
Throttled for: 52 
CPU Usage (msecs) = 327

[Multi Thread Test 2]

docker run -it --rm --cpu-quota 10000 --cpu-period 100000 hipeteryang/fibtest:latest /bin/sh -c "runfibtest 8 && cat /sys/fs/cgroup/cpu,cpuacct/cpu.stat && cat /sys/fs/cgroup/cpu,cpuacct/cpuacct.usage && cat /sys/fs/cgroup/cpu,cpuacct/cpuacct.usage_percpu"

[Result]

Throttled

Iterations Completed(M): 192 
Throttled for: 53 
CPU Usage (msecs) = 227
nr_periods 58
nr_throttled 56
throttled_time 10136239267
267434463
209397643 2871651 8044402 4226146 5891926 5532789 27939741 4104364

[Multi Thread Test 3]

docker run -it --rm --cpu-quota 10000 --cpu-period 100000 hipeteryang/stress-ng:cpu-delay /bin/sh -c "stress-ng --taskset 0 --cpu 1 --timeout 5s & stress-ng  --taskset 1-7 --cpu 7 --cpu-load-slice -1 --cpu-delay 10 --cpu-method fibonacci  --timeout 5s && cat /sys/fs/cgroup/cpu,cpuacct/cpu.stat && cat /sys/fs/cgroup/cpu,cpuacct/cpuacct.usage && cat /sys/fs/cgroup/cpu,cpuacct/cpuacct.usage_percpu"

Result

Throttled

nr_periods 56
nr_throttled 53
throttled_time 7893876370
379589091
330166879 3073914 6581265 724144 706787 5605273 29455102 3849694

For the following kubernetes test, we can use "kubectl logs pod-name" to get the result once the job is done

[Multi Thread Test 4]

apiVersion: batch/v1
kind: Job
metadata:
  name: fibtest
  namespace: default
spec:
  template:
    spec:
      containers:
      - name: fibtest
        image: hipeteryang/fibtest
        command: ["/bin/bash", "-c", "runfibtest 8 && cat /sys/fs/cgroup/cpu,cpuacct/cpu.stat && cat /sys/fs/cgroup/cpu,cpuacct/cpuacct.usage && cat /sys/fs/cgroup/cpu,cpuacct/cpuacct.usage_percpu"]
        resources:
          requests:
            cpu: "50m"
          limits:
            cpu: "100m"
      restartPolicy: Never

Result

Throttled

Iterations Completed(M): 195 
Throttled for: 52 
CPU Usage (msecs) = 230
nr_periods 56
nr_throttled 54
throttled_time 9667667360
255738571
213621103 4038989 2814638 15869649 4435168 5459186 4549675 5437010

[Multi Thread Test 5]

apiVersion: batch/v1
kind: Job
metadata:
  name: stress-ng-test
  namespace: default
spec:
  template:
    spec:
      containers:
      - name: stress-ng-test
        image: hipeteryang/stress-ng:cpu-delay
        command: ["/bin/bash", "-c", "stress-ng --taskset 0 --cpu 1 --timeout 5s & stress-ng  --taskset 1-7 --cpu 7 --cpu-load-slice -1 --cpu-delay 10 --cpu-method fibonacci  --timeout 5s && cat /sys/fs/cgroup/cpu,cpuacct/cpu.stat && cat /sys/fs/cgroup/cpu,cpuacct/cpuacct.usage && cat /sys/fs/cgroup/cpu,cpuacct/cpuacct.usage_percpu
"]
        resources:
          requests:
            cpu: "50m"
          limits:
            cpu: "100m"
      restartPolicy: Never

Result

Throttled

nr_periods 53
nr_throttled 50
throttled_time 6827221694
417331622
381601924 1267814 8332150 3933171 13654620 184120 6376208 2623172

Feel free to leave any comment, I’ll reply as soon as possible.

DaveFar · Answer

Since the fix was backported to many older Kernel versions, I do not know how to look up easily whether e.g. 4.9.0-11-amd64 #1 SMP Debian 4.9.189-3+deb9u2 (2019-11-11) x86_64 GNU/Linux has the fix.

But you can measure whether your CFS is working smoothly or is throttling too much, as described in https://gist.github.com/bobrik/2030ff040fad360327a5fab7a09c4ff1:

you run the given cfs.go with suitable settings for its sleeps and iterations as well as CFS settings, e.g. docker run --rm -it --cpu-quota 20000 --cpu-period 100000 -v $(pwd):$(pwd) -w $(pwd) golang:1.9.2 go run cfs.go -iterations 100 -sleep 1000ms
you check whether all burn took 5ms. If not, your CFS is throttling too much. This could be e.g. due to the original bug 198197 (see https://bugzilla.kernel.org/show_bug.cgi?id=198197) or the regression introduced by the fix for bug 198197 (details see https://lkml.org/lkml/2019/5/17/581).

This measurement approach is also taken in https://github.com/kubernetes/kops/issues/8954, showing that Linux kernel 4.9.0-11-amd64 is throttling too much (however, with an earlier Debian 4.9.189-3+deb9u1 (2019-09-20) than your Debian 4.9.189-3+deb9u2 (2019-11-11)).

I am experiencing a lot of CPU throttling on my k8s cluster -- do I have the high-throttling low-quota Linux kernel bug?

Tags:

linux-kernel

resources

cpu

kubernetes

cpu-usage

DaveFar

2 Answers

Free Yang

DaveFar

Recent Activity

Donate For Us

I am experiencing a lot of CPU throttling on my k8s cluster -- do I have the high-throttling low-quota Linux kernel bug?

Tags:

linux-kernel

resources

cpu

kubernetes

cpu-usage

DaveFar

2 Answers

Free Yang

DaveFar

Related questions

Recent Activity

Donate For Us