I run a <code>v1.9.2</code> custom setup of Kubernetes and scrape various metrics with Prometheus <code>v2.1.0</code>. Among others, I scrape the <code>kubelet</code> and <code>cAdvisor</code> metrics. I want to answer the question: "How much of the CPU resources defined by <code>requests</code> and <code>limits</code> in my deployment are actually used by a pod (and its containers) in terms of (milli)cores?" There are a lot of scraped metrics available, but nothing like that. Maybe it could be calculated by the CPU usage time in seconds, but I don't know how. I was considering it's not possible - until a friend told me she runs Heapster in her cluster which has a graph in the built-in Grafana that tells exactly that: It shows the indivual CPU usage of a pod and its containers in (milli)cores. Since Heapster also uses <code>kubelet</code> and <code>cAdvisor</code> metrics, I wonder: how can I calculate the same? The metric in InfluxDB is named <code>cpu/usage_rate</code> but even with Heapster's code, I couldn't figure out how they calculate it. Any help is appreciated, thanks!

We're using the <code>container_cpu_usage_seconds_total</code> metric to calculate Pod CPU usage. This metrics contains the total amount of CPU seconds consumed by container by core (this is important, as a Pod may consist of multiple containers, each of which can be scheduled across multiple cores; however, the metric has a <code>pod_name</code> annotation that we can use for aggregation). Of special interest is the change rate of that metric (which can be calculated with PromQL's <code>rate()</code> function). If it increases by 1 within one second, the Pod consumes 1 CPU core (or 1000 milli-cores) in that second. The following PromQL query does just that: Compute the CPU usage of all Pods (using the <code>sum(...) by (pod_name)</code> operation) over a five minute average: <pre class="prettyprint"><code>sum(rate(container_cpu_usage_seconds_total[5m])) by (pod_name) </code></pre>

How do I get a pod's (milli)core CPU usage with Prometheus in Kubernetes?

Tags:

kubernetes

prometheus

I run a v1.9.2 custom setup of Kubernetes and scrape various metrics with Prometheus v2.1.0. Among others, I scrape the kubelet and cAdvisor metrics.

I want to answer the question: "How much of the CPU resources defined by requests and limits in my deployment are actually used by a pod (and its containers) in terms of (milli)cores?"

There are a lot of scraped metrics available, but nothing like that. Maybe it could be calculated by the CPU usage time in seconds, but I don't know how.

I was considering it's not possible - until a friend told me she runs Heapster in her cluster which has a graph in the built-in Grafana that tells exactly that: It shows the indivual CPU usage of a pod and its containers in (milli)cores.

Since Heapster also uses kubelet and cAdvisor metrics, I wonder: how can I calculate the same? The metric in InfluxDB is named cpu/usage_rate but even with Heapster's code, I couldn't figure out how they calculate it.

Any help is appreciated, thanks!

608

asked Feb 19 '18 18:02

Alex

2 Answers

We're using the container_cpu_usage_seconds_total metric to calculate Pod CPU usage. This metrics contains the total amount of CPU seconds consumed by container by core (this is important, as a Pod may consist of multiple containers, each of which can be scheduled across multiple cores; however, the metric has a pod_name annotation that we can use for aggregation). Of special interest is the change rate of that metric (which can be calculated with PromQL's rate() function). If it increases by 1 within one second, the Pod consumes 1 CPU core (or 1000 milli-cores) in that second.

The following PromQL query does just that: Compute the CPU usage of all Pods (using the sum(...) by (pod_name) operation) over a five minute average:

sum(rate(container_cpu_usage_seconds_total[5m])) by (pod_name)

189

answered Nov 19 '22 21:11

helmbert

The following PromQL query returns per-pod number of used CPU cores starting from Kubernetes v1.16 and newer versions:

sum(rate(container_cpu_usage_seconds_total{container!=""}[5m])) by (pod)

The {container!=""} filter is needed for filtering out cgroups hierarchical stats, which is already included into per-container stats. See this answer for more details on this.

The following PromQL query must be used for Kubernetes below v1.16 because it uses different label names (e.g. container_name instead of container and pod_name instead of pod - see this issue for details):

sum(rate(container_cpu_usage_seconds_total{container_name!=""}[5m])) by (pod_name)

answered Nov 19 '22 21:11

valyala

Related questions
                            
                                Kubernetes: Kube-DNS vs. CoreDNS
                            
                                Kubernetes : Dynamic Persistent Volume provisioning using NFS
                            
                                Kubernetes deployment.extensions not found
                            
                                Error from server (BadRequest): pod kubia-zgxn9 does not have a host assigned
                            
                                How to access the service deployed on one pod via another pod in Kubernetes?
                            
                                How to set a time limit for a Kubernetes job?
                            
                                How do I load multiple templated config files into a helm chart?
                            
                                Is there a concept of inheritance for Kubernetes deployments?
                            
                                example of exec in k8s's pod by using go client
                            
                                GKE & Stackdriver: Java logback logging format?
                            
                                Kubernetes add ca certificate to pods' trust root
                            
                                Mapping incoming port in kubernetes service to different port on docker container
                            
                                How do you get kubectl to log in to an AWS EKS cluster?
                            
                                Why does `kubectl logs` only show the most recent log rows?
                            
                                Kubernetes - connection refused diagnosis
                            
                                List of all possible status / reasons in Kubernetes
                            
                                How can I fix Edit cancelled, no changes made in shell
                            
                                Docker how to use boolean value on spec.container.env.value
                            
                                How to share .kube/config?
                            
                                Monitoring a kubernetes job

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With