cadvisor has two metrics container_cpu_cfs_throttled_seconds_total and container_cpu_cfs_throttled_periods_total
I have confuse what does that means ..
I have found about two explain:
container run with cpu limit, when container cpu over limit , the container will be "throttled" and add time to container_cpu_cfs_throttled_seconds_total
that means :
(1). only container cpu over limit, rate(container_cpu_cfs_throttled_seconds_total) > 0.
(2). we can use this metrics to alert container cpu over limit ...
when host in heavy cpu pressure, it will "throttled" container with POD QoS(Guaranteed > Burstable > Best-Effort) ...
that means :
(1). container_cpu_cfs_throttled_seconds_total will add has no relate with how many cpu container used and cpu limit ..
(2). this metrics can not to alert container cpu over limit ..
container_cpu_cfs_throttled_seconds_total is the sum of all throttle durations, i.e. durations that the container was throttled, i.e. stopped using the uses CFS Cgroup bandwidth control.
cAdvisor analyzes metrics for memory, CPU, file, and network usage for all containers running on a given node. However, it doesn't store this data long-term, so you need a dedicated monitoring tool. Since cAdvisor is already integrated with the kubelet binary, there are no special steps required to install it.
Kubernetes metrics help you ensure all pods in a deployment are running and healthy. They provide information such as how many instances a pod currently has and how many were expected. If the number is too low, your cluster may run out of resources.
container_memory_usage_bytes (Total): Total memory usage of a container, regardless of when it was accessed.
container_cpu_cfs_throttled_seconds_total
is the sum of all throttle durations, i.e. durations that the container was throttled, i.e. stopped using the uses CFS Cgroup bandwidth control.
Since each stopped thread adds its throttled durations to container_cpu_cfs_throttled_seconds_total
, this number can become huge and does not help you (unless you have a known, fixed number of threads).
That is why alerting on CPU throttling is usually based on the metrics throttled percentage
:= container_cpu_cfs_throttled_periods_total / container_cpu_cfs_periods_total
, i.e. the percentage of CPU periods where the container ran but was throttled (stopped from running the whole CPU period).
For more detail, you can watch this talk on CFS and CPU scheduling, or read the corresponding article.
Lets say httpbin container running on machine1. Lets say httbin has a limit set in it's deployment to use maximum of 1 CPU. And machine1 has 2 CPUs. It makes httpbin to use half the available.
If httpbin container is trying to use more than 1 CPU, kubernetes will not kill the container. It will throttle it. If it is happening frequently, you may want to get alerted on that and fix the deployment. Another scenario is, if there are multiple containers in machine1 and if there is a lack of CPU resource, then it will throttle all containers it has.
container_cpu_cfs_throttled_seconds_total is the Total time duration the container has been throttled in seconds. container_cpu_cfs_throttled_periods_total is the Number of throttled period intervals
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With