I can pull the following CPU values from libvirt:
virsh domstats vm1 --cpu-total
Domain: 'vm1'
cpu.time=6173016809079111
cpu.user=26714880000000
cpu.system=248540680000000
virsh cpu-stats vm1 --total
Total:
cpu_time 6173017.263233824 seconds
user_time 26714.890000000 seconds
system_time 248540.700000000 seconds
What does the cpu_time figure represent here exactly?
I'm looking to calculate CPU utilization as a percentage using this data.
Thanks
So libvirt is intended to be a building block for higher level management tools and for applications focusing on virtualization of a single node (the only exception being domain migration between node capabilities which involves more than one node).
libvirt is a collection of software that provides a common API (Application Programming Interface) for managing popular virtualization solutions, for example KVM and Xen. libvirt consists of an API library, a system service libvirtd , and a command line utility virsh .
This was a surprisingly difficult question to answer! After pouring over the kernel code for a good while I've figured out what's going on here and its quite nice to learn what's going on.
Normally for a process on Linux, the overall CPU usage is simply the sum of the time spent in userspace and the time spent on kernel space. So naively one would have expected user_time + system_time
to equal cpu_time
. What I've discovered is that Linux tracks time spent by vCPU threads executing guest code separately from either userspace or kernelspace time.
Thus cpu_time == user_time + system_time + guest_time
So you can think of system_time + user_time
as giving the overhead of QEMU / KVM on the host side. And cpu_time - (user_time + system_time)
as giving the actual amount of time the guest OS was running its CPUs.
To calculate CPU usage, you probably just want to record cpu_time
every N seconds and calculate the delta between two samples. eg usage % = 100 * (cpu_time 2 - cpu_time 1) / N
As per master pulled 2018-07-10 from https://github.com/libvirt/libvirt/ and as far as QEMU/KVM is concerned, it comes down to:
cpu.time
= cpuacct.usage
cgroup metriccpu.{user,system}
= cpuacct.stat
cgroup metricsProblem one may encounter is guest load = time load - system load - user load
sometime leads to negative values (?!?), example given for a running QEMU/KVM guest (values are seconds), with Debian 9 stock kernel (4.9):
time system user total
2018-07-10T13:19:20Z 62308.67 9278.59 107968.33
2018-07-10T13:20:20Z 62316.08 9279.73 107970.73
delta 7.41 1.14 2.40 (2.40 < 7.41+1.14 ?!?)
Kernel bug ? (at least one person experiments something similar: https://lkml.org/lkml/2017/11/1/101)
One thing is certain: cpuacct.usage
and cpuacct.stat
do use a different logic to gather their metrics; this might explain the discrepancy (?).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With