I have alarms set up to tell me when my load balancers are throwing 5xxs using the HTTPCode_Backend_5XX
metric with the sum
statistic. The issue is that sum
registers 0 as no data points, so when no 5xxs are thrown, the alarm is treated as insufficient data. This is especially frustrating, because I have SNS setup to notify me whenever we get too many 5xxs (alarm state) and whenever things go back to normal. Annoyingly, 0 5xxs means we're in INSUFFICIENT DATA
status, but 1 5xx means we're in OK
status, so 1 5xx triggers everyone getting notified that stuff is OK. Is there any way around this? Ideally, I'd like to just have 0 of anything show up as a zero data point instead of no data at all (insufficient data).
CloudWatch ServiceLens lets you gain visibility into your applications in three main areas: infrastructure monitoring (using metrics and logs to understand the resources supporting your applications), transaction monitoring (using traces to understand dependencies between your resources), and end-user monitoring (using ...
You might be unable to see a breaching data point that triggered your alarm if that data hasn't flowed into the metric yet. When you review the event history later, you can see the complete set of data points, which have now flowed into the metric.
To reduce costs, delete unnecessary dashboards. If you're using the AWS Free Tier, keep your total number of dashboards to three or less. Also be sure to keep the total number of metrics across all dashboards to less than 50.
If no data points in the evaluation range are missing, CloudWatch evaluates the alarm based on the most recent data points collected. The number of data points evaluated is equal to the Evaluation Periods for the alarm. The extra data points from farther back in the evaluation range are not needed and are ignored.
As of March 2017, you can treat missing data as acceptable. This will prevent the alarm from being marked as INSUFFICIENT.
You can also set this in CloudFormation using the TreatMissingData property. For example:
TreatMissingData: notBreaching
We had a similar issue for some of our alarms. You can actually avoid this behaviour with some work, if you really want to deal with the overhead.
What we have done is, instead of sending SNS notifications directly to e-mails, we have created a lambda function and triggered it once we have the notification in the SNS topic.
This way, you will have more control over the actions you can take once the alarms are triggered. As the context will provide you old state value as well.
The good news is, there is already a lambda template to get started. https://aws.amazon.com/blogs/aws/new-slack-integration-blueprints-for-aws-lambda/
Just pick the one that is designed to send cloudwatch alarms to slack. You can then modify the code as you wish, either dismiss the slack part and just use emails, or keep it with slack. (which is what we did and it works like a charm)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With