How to get overall uptime of a server with prometheus and node_exporter

Tags:

prometheus-node-exporter

I'm looking for a query to get the average uptime of the server on which prometheus runs over the last week. It should be about 15h/week, so about 8-10 %.

I'm using Prometheus 2.5.0 with node_exporter on CentOS 7.6.1810. My most promising experiments would be:

1 - avg_over_time(up{job="prometheus"}[7d])

This is what I've found when looking for ways to get average uptimes, but it gives me exactly 1. (My guess is it ignores the times in which no scrapes happened?)

2 - sum_over_time(up{job="prometheus"}[7d]) * 15 / 604800

This technically works, but is dependent on the scrape interval, which is 15s in my case. I can't seem to find a way to get said interval from prometheus' config, so I have to hardcode it into the query.

I've also tried to find ways to get all start and end times of a job, but to no avail thus far.

298

asked Sep 24 '19 12:09

npath

1 Answers

Here you go. Don't ask. (o:

avg_over_time(
  (
    sum without() (up{job="prometheus"})
      or
    (0 * sum_over_time(up{job="prometheus"}[7d]))
  )[7d:5m]
)

To explain that bit by bit:

sum without() (up{job="prometheus"}): take the up metric (the sum without() part is there to get rid of the metric name while keeping all other labels);
0 * sum_over_time(up{job="prometheus"}[7d]): produces a zero-valued vector for each of the up{job="prometheus"} label combinations seen over the past week (e.g. in case you have multiple Prometheus instances);
or the two together, so you get the actual value where available, zero where missing;
[7d:5m]: PromQL subquery, produces a range vector spanning 7 days, with 5 minute resolution based on the expression preceding it;
avg_over_time: takes an average over time of the up metric with zeroes filled in as defaults, where missing.

You may also want to tack on an and sum_over_time(up{job="prometheus"}[7d] to the end of that expression, to only get a result for label combinations that existed at some point over the previous 7 days. Else, because of the combination of 7 days range and 7 days subquery, you'll get results for all combinations over the previous 14 days.

It is not an efficient query by any stretch of the imagination, but it does not require you to hardcode your scrape interval into the query. As requested. (o:

123

answered Sep 26 '22 18:09

Alin Sînpălean

Related questions
                            
                                How to deal with "prometheus" and "prometheus_replica" labels?
                            
                                Alert on missing series/data
                            
                                multiple values from grafana variable in prometheus query
                            
                                Prometheus : how do i sum by with 2 different metrics
                            
                                Prometheus-Grafana : How to use wildcard in query
                            
                                How to get all the metrics of an instance with prometheus api?
                            
                                How to merge zero values (vector(0) with metric values in PromQL
                            
                                How do I tell Prometheus' Alertmanager to send email through Gmail's SMTP server
                            
                                How to combine separate timeseries labels in Prometheus query?
                            
                                Prometheus to get trigger an alert when node is in unschedulable state
                            
                                How to configure alerts in Prometheus for diskspace
                            
                                where can I find the list of systemd node exporter metrics?
                            
                                Monitoring java native memory
                            
                                Monitor only one namespace metrics - Prometheus with Kubernetes
                            
                                Prometheus auto discovery K8s
                            
                                How to provide label_values in grafana variables with time range for prometheus data source?
                            
                                Forbidden to access Kubernetes API Server
                            
                                How to execute a query with two metrics in Prometheus?
                            
                                Prometheus how "up" metrics works
                            
                                Prometheus query to count unique labels over a timeframe

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With