I have the following expression for an alert in Prometheus:
absent_over_time(node_filesystem_size_bytes{device=~"/dev/nvme.*"}[5m]) > 0
As far as I can tell, this will be > 0 if there are no metrics for /dev/nvme* in the last 5 minutes.
However, I would like this to take into account the instance that it runs on. That is, I want this to be triggered if any instance (preferably with a label saying which one(s)) is missing this metric over the last 5 minutes. I'm assuming that if one node is successfully scraped, this condition won't be true anymore.
There is an instance label on the node_filesystem_size_bytes metric, but if that metric is missing, I'm not sure how it could detect that.
Do I need to somehow get a list of instances at the current time and join it with node_filesystem_size_bytes to accomplish this? Or is there some other way?
This is using node-exporter and the kube-prometheus-stack in a Kubernetes cluster.
Try the following query:
(node_filesystem_size_bytes{device=~"/dev/nvme.*"} offset 5m)
unless
node_filesystem_size_bytes{device=~"/dev/nvme.*"}
It should return time series matching the node_filesystem_size_bytes{device=~"/dev/nvme.*"} with all their labels, which had raw samples 5 minutes ago, but have no new samples aftewards.
Note that this query can be simplified to the following one when using VictoriaMetrics - Prometheus-like system I work on:
lag(node_filesystem_size_bytes{device=~"/dev/nvme.*"}[10m]) > 5m
It uses the lag() function, which returns the duration since the last raw sample per every input time series.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With