I am pretty new to prometheus,
I am trying to predict the next 5 hours cpu usage on node 1 and node 2.
My code is
'''
predict_linear(cpu_usage[5m],5*3600)
''' Since the cpu_usage includes two nodes, when I use the equation above, I got two prediction result, which I do not want it.
So I improve my code by adding '''
sum(predict_linear(cpu_usage[5m],5*3600)).
''' I am not sure it is the right way or not. I read the document and it mentioned that predic_linear is only for gauge.
Thanks guys,
Pro-active monitoring doesn't really apply to cpu. It is rather intended for system resources exhaustion such as memory or drive space. There is nothing wrong with a 100% cpu usage provided it doesn't mean there is a performance issue with your application.
If you really have some benchmarks showing that the cpu shouldn't reach 100%, it is rather on a reactive basis: you want to be alerted if the cpu is stuck at 100% cpu for a given amount of time.
Regarding your question, a prediction based on 5 minutes of data for predicting the next 5 hours will be really noisy. It is not uncommon for an application to ramp up consumption over a few minutes (even tens of minutes). Moreover, even if the shape of application memory usage is a perfect step, the predic_linear() function uses linear regression a will compute a rate averaged on the bottom and top of the step (at some point).
And this small rate of consumption will quickly adds up if interpolated to 5 hours. As an example, if your node is provisioned as medium size (4GiB), and you have near 0 memory consumption at t0, the maximum rate not detecting memory outage will be 4*Gi/(5*60)=~13MB/min. If you alert on that, you will have plenty of false positive.
I found it useful to:
for clause in rules to reduce false positivesFinally, there are many points in your question:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With