Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Monitoring CPU Utilization using Prometheus

I am trying to monitor the cpu utilization of the machine in which Prometheus is installed and running. I have a metric process_cpu_seconds_total. I can find irate or rate of this metric. But I am not too sure how to come up with the percentage value for CPU utilization. Is there anyway I can use this process_cpu_seconds_total metric to find the CPU utilization of the machine where Prometheus runs?

like image 469
Arnav Bose Avatar asked Feb 21 '18 22:02

Arnav Bose


2 Answers

A late answer for others' benefit too:

If you're wanting to just monitor the percentage of CPU that the prometheus process uses, you can use process_cpu_seconds_total, e.g. something like:

avg by (instance) (irate(process_cpu_seconds_total{job="prometheus"}[1m]))

However, if you want a general monitor of the machine CPU as I suspect you might be, you should set-up Node exporter and then use a similar query to the above, with the metric node_cpu_seconds_total. E.g.:

avg by (instance,mode) (irate(node_cpu_seconds_total{mode!='idle'}[1m]))

The rate or irate are equivalent to the percentage (out of 1) since they are how many seconds used of a second, but usually need to be aggregated across cores/cpus on the machine.
Brian Brazil's post on Prometheus CPU monitoring is very relevant and useful: https://www.robustperception.io/understanding-machine-cpu-usage

like image 103
lambfrier Avatar answered Oct 12 '22 05:10

lambfrier


One way to do is to leverage proper cgroup resource reporting. Cgroup divides a CPU core time to 1024 shares. So by knowing how many shares the process consumes, you can always find the percent of CPU utilization.

Now in your case, if you have the change rate of CPU seconds, which is how much time the process used CPU time in the last time unit (assuming 1s from now on). Then depends how many cores you have, 1 CPU in the last 1 unit will have 1 CPU second. So if your rate of change is 3 and you have 4 cores.

3/4 = 75% CPU utilization.

It is only a rough estimation, as your process_total_cpu time is probably not very accurate due to delay and latency etc.

like image 1
ZhijieWang Avatar answered Oct 12 '22 07:10

ZhijieWang