Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to silence Prometheus Alertmanager using config files?

I'm using the official stable/prometheus-operator chart do deploy Prometheus with helm.

It's working good so far, except for the annoying CPUThrottlingHigh alert that is firing for many pods (including the own Prometheus' config-reloaders containers). This alert is currently under discussion, and I want to silence its notifications for now.

The Alertmanager has a silence feature, but it is web-based:

Silences are a straightforward way to simply mute alerts for a given time. Silences are configured in the web interface of the Alertmanager.

There is a way to mute notifications from CPUThrottlingHigh using a config file?

like image 524
Eduardo Baitello Avatar asked Feb 21 '19 11:02

Eduardo Baitello


People also ask

How do I silence my Alertmanager?

To mute those alerts to prevent them to be sent over and over they can be disabled by marking them as “silenced”. An alert can be silenced with the Web UI of the Alertmanager, see the documentation.

What is silence in Prometheus?

A silence is configured based on matchers, just like the routing tree. Incoming alerts are checked whether they match all the equality or regular expression matchers of an active silence. If they do, no notifications will be sent out for that alert. Silences are configured in the web interface of the Alertmanager.

How do I refresh Alertmanager config?

Alertmanager can reload its configuration at runtime. If the new configuration is not well-formed, the changes will not be applied and an error is logged. A configuration reload is triggered by sending a SIGHUP to the process or sending a HTTP POST request to the /-/reload endpoint.

How do you use Alertmanager in Prometheus?

The main steps to setting up alerting and notifications are: Setup and configure the Alertmanager. Configure Prometheus to talk to the Alertmanager. Create alerting rules in Prometheus.


3 Answers

One option is to route alerts you want silenced to a "null" receiver. In alertmanager.yaml:

route:
  # Other settings...
  group_wait: 0s
  group_interval: 1m
  repeat_interval: 1h

  # Default receiver.
  receiver: "null"

  routes:
  # continue defaults to false, so the first match will end routing.
  - match:
      # This was previously named DeadMansSwitch
      alertname: Watchdog
    receiver: "null"
  - match:
      alertname: CPUThrottlingHigh
    receiver: "null"
  - receiver: "regular_alert_receiver"

receivers:
  - name: "null"
  - name: regular_alert_receiver
    <snip>
like image 53
clay Avatar answered Oct 18 '22 19:10

clay


I doubt there exists a way to silence alerts via configuration (other than routing said alerts to a /dev/null receiver, i.e. one with no email or any other notification mechanism configured, but the alert would still show up in the Alertmanager UI).

You can apparently use the command line tool amtool that comes with alertmanager to add a silence (although I can't see a way to set an expiration time for the silence).

Or you can use the API directly (even though it is not documented and in theory it may change). According to this prometheus-users thread this should work:

curl https://alertmanager/api/v1/silences -d '{
      "matchers": [
        {
          "name": "alername1",
          "value": ".*",
          "isRegex": true
        }
      ],
      "startsAt": "2018-10-25T22:12:33.533330795Z",
      "endsAt": "2018-10-25T23:11:44.603Z",
      "createdBy": "api",
      "comment": "Silence",
      "status": {
        "state": "active"
      }

}'
like image 27
Alin Sînpălean Avatar answered Oct 18 '22 18:10

Alin Sînpălean


Well, I managed it to work by configuring a hackish inhibit_rule:

inhibit_rules:
- target_match:
     alertname: 'CPUThrottlingHigh'
  source_match:
     alertname: 'DeadMansSwitch'
  equal: ['prometheus']

The DeadMansSwitch is, by design, an "always firing" alert shipped with prometheus-operator, and the prometheus label is a common label for all alerts, so the CPUThrottlingHigh ends up inhibited forever. It stinks, but works.

Pros:

  • This can be done via the config file (using the alertmanager.config helm parameter).
  • The CPUThrottlingHigh alert is still present on Prometheus for analysis.
  • The CPUThrottlingHigh alert only shows up in the Alertmanager UI if the "Inhibited" box is checked.
  • No annoying notifications on my receivers.

Cons:

  • Any changes in DeadMansSwitch or the prometheus label design will break this (which only implies the alerts firing again).

Update: My Cons became real...

The DeadMansSwitch altertname just changed in the stable/prometheus-operator 4.0.0. If using this version (or above), the new alertname is Watchdog.

like image 10
Eduardo Baitello Avatar answered Oct 18 '22 18:10

Eduardo Baitello