Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

GCP Alert Filters Don't Affect Open Incidents

I have an alert that I have configured to send email when the sum of executions of cloud functions that have finished in status other than 'error' or 'ok' is above 0 (grouped by the function name).

The way I defined the alert is:

alert definition

And the secondary aggregator is delta.

The problem is that once the alert is open, it looks like the filters don't matter any more, and the alert stays open because it sees that the cloud function is triggered and finishes with any status (even 'ok' status keeps it open as long as its triggered enough).

ATM the only solution I can think of is to define a log based metric that will count it itself and then the alert will be based on that custom metric instead of on the built in one.

Is there something that I'm missing?

Edit:

Adding another image to show what I think might be the problem: incident

From the image above we see that the graph wont go down to 0 but will stay at 1, which is not the way other normal incidents work

like image 750
Max Avatar asked Jun 21 '21 08:06

Max


People also ask

What is not a recommended best practice for alerts in GCP?

What is not a recommended best practice for alerts? []Configure alerting on symptoms and not necessarily causes. []Use multiple notification channels so you avoid a single point of failure.

How do I trigger alerts in GCP?

You can add a log-based alerting policy to your Google Cloud project by using the Logs Explorer in Cloud Logging or by using the Monitoring API. This content does not apply to log-based alerting policies. For information about log-based alerting policies, see Monitoring your logs.

What is alignment period?

The alignment period is a look-back interval from a particular point in time; the aligner is the function that combines the points into the look-back interval into an aligned value.

What features does Google Cloud have to provide monitoring for their cloud?

Key Features: The system includes an autodiscovery feature and that is able to detect Google Compute Engine (GCE), Google App Engine (GAE), Google Kubernetes Engine, VPC, Cloud IAM, Cloud Audit Logging, Cloud SQL, and BigQuery instances, among others. The system can also track AWS and Azure services.


1 Answers

According to the official documentation:

"Monitoring automatically closes an incident when it observes that the condition is no longer met or when 7 days have passed without an observation that the condition is still being met."

That made me think that there are times where the condition is not relevant to make it close the incident. Which is confirmed here:

"If measurements are missing (for example, if there are no HTTP requests for a couple of minutes), the policy uses the last recorded value to evaluate conditions."

The lack of HTTP requests aren't a reason to close the metric as it keeps using the last recorded value (that triggered the metric).

So, using alerts for Http Requests is fine but you need to close them by yourself. Although I think it would be better to use a custom metric instead if you want them to be disabled automatically.

like image 147
Anton L Avatar answered Oct 08 '22 22:10

Anton L