Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Prometheus query to calculate avg_over_time up-time, but want to ignore down-time less than 1 minute

I am new to Prometheus and made a query below trying to display the average up-time of a certain website in percentage for SLA monitoring (let's say Google for example).

(avg_over_time(probe_success{instance="https://www.google.com/"}[$__range])) * 100 

However, is it possible to make the calculate ignore any down-time where it is less than 1 minute?

like image 752
Sion Avatar asked Oct 20 '25 18:10

Sion


1 Answers

The best way to go about SLAs for probes is using quantile function like:

quantile_over_time(0.99, probe_success{instance="https://www.google.com/"}[$__range])

It is not exactly this query, but one needs to think from the basic with quantiles in mind.


That said, to answer the question directly, avoiding 1-min downtimes, this can help:

avg_over_time(((avg_over_time(probe_success{instance="https://www.google.com"}[75s]) * 75) > bool(60))[$__range:]) * 100

Lets dissect this query now:

avg_over_time(probe_success{instance="https://www.google.com"}[75s]) gets average of the probe over 75s, so we can try and ignore 1m downtimes. Call this UP_TIME_PERCENTAGE.

UP_TIME_PERCENTAGE * 75 provides the up time in seconds over the past 75s. Call this UP_TIME_75S.

UP_TIME_75S > bool(60) provides a boolean 1 or 0 timeline, indicating whether the uptime was more than a minute. Call this IS_UP_MORE_THAN_1M

avg_over_time(IS_UP_MORE_THAN_1M[$__range:]) * 100 results in the percentage of probes with up time more than 1m in the given $__range. Note the :. It is important to apply ..._over_time method on sub-queries.

like image 133
droidbot Avatar answered Oct 23 '25 07:10

droidbot



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!