I'm monitoring some services with blackbox_exporter and prometheus. This works great to calculate the service availability but I'm questioning myself if it is possible to get a summary of down time ranges in the last x days with PromQL?
For example if probe_success turns 0 between 1 PM and 1:30 PM and than again from 3 to 3:15 PM I want to get a list like this one in Grafana:
Downtime:
1 PM - 1:30 PM | 30 mins
3 PM - 3:15 PM | 15 mins
and so on.
What you are asking is difficult with PromQL. Prometheus is a time series database and you want to recover the events from those metrics.
There is a way to recover the events where the status 0/1 of a metric changed:
you would use the changes() function with a detection range matching the poll interval of your metric to extract the change event (if the poll interval is wrong, you will see duplicated changes and may miss some event)
changes(metric[30s]) != 0
and then use the actual metric value to identify up/down switch
(changes(metric[30s]) != 0) * metric
You can visualize the output using sub-query: ((changes(metric[30s]) != 0) * metric)[2d:]
0 @1627421720
1 @1627427120
0 @1627508120
1 @1627513520
The value gives you the new state, and the timestamp (after @) gives you the epoch time of the event (approximately depending on poll time).
We are not far from what you want, the difficulty being the way to take those metrics and transform them into the consolidated table.
I uses Grafana v8.0.4 at the time of this answer and I don't see an way to integrate that in the current table visualization. My best advice would be to use a HTML panel and run you own JavaScript to display what you want.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With