Prometheus send resolved notification when metric data missing

Question

We use Prometheus Alertmanager for alerts. Frequently, we are missing metrics because of some connection problems.

So, when metrics are missing, Prometheus clear alerts and send resolved alert. After a few minutes, connection problem fixed and firing alerts repeating.

Is there any way to stop the resolved alerts when metric data missing?

For example; When a node down, other alerts for the node(cpu, disk usage controls) are resolved.

values on alertmanager config:

  repeat_interval: 1d
  resolve_timeout: 15m

  group_wait: 1m30s
  group_interval: 5m

  scrape_interval: 1m
  scrape_timeout: 1m 
  evaluation_interval: 30s

NodeDown alert:

  - alert: NodeDown
    expr: up == 0
    for: 30s
    labels:
      severity: critical
      alert_group: host
    annotations:
      summary: "Node is down: instance {{ $labels.instance }}"
      description: "Can't react to node_exporter at {{ $labels.instance }}. Probably host is down."

anemyte · Accepted Answer

Alertmanager can inhibit (=automatically silence) alerts on certain conditions. You will not see inhibited alerts neither firing, nor resolving until the inhibiting condition is false again. Here is an example of one such rule:

inhibit_rules:
- # Mute alerts with "severity" label equals to "warning" ...
  target_matchers:
  - severity = warning

  # ... when an alert named "ExporterDown" is firing ...
  source_matchers:
  - alertname = ExporterDown

  # ... if both the muted and the firing alerts have exactly the same "job" and "instance" labels.
  equal: [instance, job]

To summarize, the above automatically silences all warning alerts for a certain machine, when the metric source is down. The link above will lead you to the documentation, where you can find more on the subject.

Prometheus send resolved notification when metric data missing

Tags:

prometheus

prometheus-alertmanager

Melike Sozeri

1 Answers

anemyte

Recent Activity

Donate For Us

Prometheus send resolved notification when metric data missing

Tags:

prometheus

prometheus-alertmanager

Melike Sozeri

1 Answers

anemyte

Related questions

Recent Activity

Donate For Us