Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use forcasting in prometheus predict_linear

Tags:

prometheus

I am pretty new to prometheus,

I am trying to predict the next 5 hours cpu usage on node 1 and node 2.

My code is

'''

    predict_linear(cpu_usage[5m],5*3600)

''' Since the cpu_usage includes two nodes, when I use the equation above, I got two prediction result, which I do not want it.

So I improve my code by adding '''

    sum(predict_linear(cpu_usage[5m],5*3600)).

''' I am not sure it is the right way or not. I read the document and it mentioned that predic_linear is only for gauge.

Thanks guys,

like image 706
qing zhang Avatar asked Oct 22 '25 14:10

qing zhang


1 Answers

Pro-active monitoring doesn't really apply to cpu. It is rather intended for system resources exhaustion such as memory or drive space. There is nothing wrong with a 100% cpu usage provided it doesn't mean there is a performance issue with your application.

If you really have some benchmarks showing that the cpu shouldn't reach 100%, it is rather on a reactive basis: you want to be alerted if the cpu is stuck at 100% cpu for a given amount of time.

Regarding your question, a prediction based on 5 minutes of data for predicting the next 5 hours will be really noisy. It is not uncommon for an application to ramp up consumption over a few minutes (even tens of minutes). Moreover, even if the shape of application memory usage is a perfect step, the predic_linear() function uses linear regression a will compute a rate averaged on the bottom and top of the step (at some point).

And this small rate of consumption will quickly adds up if interpolated to 5 hours. As an example, if your node is provisioned as medium size (4GiB), and you have near 0 memory consumption at t0, the maximum rate not detecting memory outage will be 4*Gi/(5*60)=~13MB/min. If you alert on that, you will have plenty of false positive.

I found it useful to:

  • increase the range of data of measurement (rule of thumb ~ 20% or 25% time of interpolation, so 1h for 5h prediction)
  • adapt of the for clause in rules to reduce false positives
  • add a limit on current consumption: if current level is less than 60%, chances are the outage detected is not real

Finally, there are many points in your question:

  • computing the sum of cpu: you'd rather compute the average which gives you the overall cpu usage - I never found that especially useful since an application may be stuck on a cpu and be cpu-bounded
  • two prediction results: I expect it is what you want, each cpu should be alerted on individually
  • predict_linear is only for gauge: it can be applied on counters but as stated at the beginning of this answer, it is rather used for resource exhaustion and you wouldn't have a resource measured with a counter.
like image 62
Michael Doubez Avatar answered Oct 25 '25 03:10

Michael Doubez



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!