I am experiencing a lot of CPU throttling (see nginx graph below, other pods often 25% to 50%) in my Kubernetes cluster (k8s v1.18.12, running 4.9.0-11-amd64 #1 SMP Debian 4.9.189-3+deb9u2 (2019-11-11) x86_64 GNU/Linux).
Due to backports, I do not know whether my cluster contains the Linux kernel bug described in https://lkml.org/lkml/2019/5/17/581. How can I find out? Is there a simple way to check or measure?
If I have the bug, what is the best approach to get the fix? Or should I mitigate otherwise, e.g. not use CFS quota (--cpu-cfs-quota=false
or no CPU limits) or reduce cfs_period_us
and cfs_quota_us
?
CPU Throttling Percentage for nginx (scaling horizontally around 15:00 and removing CPU limits around 19:30):
Recently, I'm working on debuging the cpu throttling issue, with the following 5 tests, I've tested out the bug in kernel (Linux version 4.18.0-041800rc4-generic)
This test case is intended to hit 100% throttling for the test 5000ms / 100 ms periods = 50 periods. A kernel without this bug should be able to have a CPU usage stats about 500ms.
Maybe you can try these tests to check whether your kernel will be throttlled.
[Multi Thread Test 1]
./runfibtest 1; ./runfibtest
From <https://github.com/indeedeng/fibtest>
[Result]
Throttled
./runfibtest 1
Iterations Completed(M): 465
Throttled for: 52
CPU Usage (msecs) = 508
./runfibtest 8
Iterations Completed(M): 283
Throttled for: 52
CPU Usage (msecs) = 327
[Multi Thread Test 2]
docker run -it --rm --cpu-quota 10000 --cpu-period 100000 hipeteryang/fibtest:latest /bin/sh -c "runfibtest 8 && cat /sys/fs/cgroup/cpu,cpuacct/cpu.stat && cat /sys/fs/cgroup/cpu,cpuacct/cpuacct.usage && cat /sys/fs/cgroup/cpu,cpuacct/cpuacct.usage_percpu"
[Result]
Throttled
Iterations Completed(M): 192
Throttled for: 53
CPU Usage (msecs) = 227
nr_periods 58
nr_throttled 56
throttled_time 10136239267
267434463
209397643 2871651 8044402 4226146 5891926 5532789 27939741 4104364
[Multi Thread Test 3]
docker run -it --rm --cpu-quota 10000 --cpu-period 100000 hipeteryang/stress-ng:cpu-delay /bin/sh -c "stress-ng --taskset 0 --cpu 1 --timeout 5s & stress-ng --taskset 1-7 --cpu 7 --cpu-load-slice -1 --cpu-delay 10 --cpu-method fibonacci --timeout 5s && cat /sys/fs/cgroup/cpu,cpuacct/cpu.stat && cat /sys/fs/cgroup/cpu,cpuacct/cpuacct.usage && cat /sys/fs/cgroup/cpu,cpuacct/cpuacct.usage_percpu"
Result
Throttled
nr_periods 56
nr_throttled 53
throttled_time 7893876370
379589091
330166879 3073914 6581265 724144 706787 5605273 29455102 3849694
For the following kubernetes test, we can use "kubectl logs pod-name" to get the result once the job is done
[Multi Thread Test 4]
apiVersion: batch/v1
kind: Job
metadata:
name: fibtest
namespace: default
spec:
template:
spec:
containers:
- name: fibtest
image: hipeteryang/fibtest
command: ["/bin/bash", "-c", "runfibtest 8 && cat /sys/fs/cgroup/cpu,cpuacct/cpu.stat && cat /sys/fs/cgroup/cpu,cpuacct/cpuacct.usage && cat /sys/fs/cgroup/cpu,cpuacct/cpuacct.usage_percpu"]
resources:
requests:
cpu: "50m"
limits:
cpu: "100m"
restartPolicy: Never
Result
Throttled
Iterations Completed(M): 195
Throttled for: 52
CPU Usage (msecs) = 230
nr_periods 56
nr_throttled 54
throttled_time 9667667360
255738571
213621103 4038989 2814638 15869649 4435168 5459186 4549675 5437010
[Multi Thread Test 5]
apiVersion: batch/v1
kind: Job
metadata:
name: stress-ng-test
namespace: default
spec:
template:
spec:
containers:
- name: stress-ng-test
image: hipeteryang/stress-ng:cpu-delay
command: ["/bin/bash", "-c", "stress-ng --taskset 0 --cpu 1 --timeout 5s & stress-ng --taskset 1-7 --cpu 7 --cpu-load-slice -1 --cpu-delay 10 --cpu-method fibonacci --timeout 5s && cat /sys/fs/cgroup/cpu,cpuacct/cpu.stat && cat /sys/fs/cgroup/cpu,cpuacct/cpuacct.usage && cat /sys/fs/cgroup/cpu,cpuacct/cpuacct.usage_percpu
"]
resources:
requests:
cpu: "50m"
limits:
cpu: "100m"
restartPolicy: Never
Result
Throttled
nr_periods 53
nr_throttled 50
throttled_time 6827221694
417331622
381601924 1267814 8332150 3933171 13654620 184120 6376208 2623172
Feel free to leave any comment, I’ll reply as soon as possible.
Since the fix was backported to many older Kernel versions, I do not know how to look up easily whether e.g. 4.9.0-11-amd64 #1 SMP Debian 4.9.189-3+deb9u2 (2019-11-11) x86_64 GNU/Linux
has the fix.
But you can measure whether your CFS is working smoothly or is throttling too much, as described in https://gist.github.com/bobrik/2030ff040fad360327a5fab7a09c4ff1:
cfs.go
with suitable settings for its sleeps and iterations as well as CFS settings, e.g. docker run --rm -it --cpu-quota 20000 --cpu-period 100000 -v $(pwd):$(pwd) -w $(pwd) golang:1.9.2 go run cfs.go -iterations 100 -sleep 1000ms
burn
took 5ms. If not, your CFS is throttling too much. This could be e.g. due to the original bug 198197 (see https://bugzilla.kernel.org/show_bug.cgi?id=198197) or the regression introduced by the fix for bug 198197 (details see https://lkml.org/lkml/2019/5/17/581).This measurement approach is also taken in https://github.com/kubernetes/kops/issues/8954, showing that Linux kernel 4.9.0-11-amd64
is throttling too much (however, with an earlier Debian 4.9.189-3+deb9u1 (2019-09-20)
than your Debian 4.9.189-3+deb9u2 (2019-11-11)
).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With