Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

CloudWatch Alarm Percentage of errors API Gateway

I'm trying to setup and alarm in Cloudwatch using terraform. My alarm basically needs to check if there is more than 5% of 5xx errors in the gateway during 2 periods of 1 minute.

I've tried the following code but it's not working:

resource "aws_cloudwatch_metric_alarm" "gateway_error_rate" {
  alarm_name          = "gateway-errors"
  comparison_operator = "GreaterThanOrEqualToThreshold"
  alarm_description   = "Gateway error rate has exceeded 5%"
  treat_missing_data  = "notBreaching"
  metric_name         = "5XXError"
  namespace           = "AWS/ApiGateway"
  period              = 60
  evaluation_periods  = 2
  threshold           = 5
  statistic           = "Average"
  unit                = "Percent"

  dimensions = {
    ApiName = "my-api"
    Stage = "dev"
  }
}

Even thee alert is deployed, the data is not displayed. Doing some tests I've noticed that apparently the unit "percent" is not accepted for this alarm.

Does anyone have an example in terraform or cloudformation on how to configure this type of alarms?

like image 925
Jaime S Avatar asked Jul 03 '20 05:07

Jaime S


People also ask

How do I get CloudWatch Logs for API gateway?

Open the CloudWatch console at https://console.aws.amazon.com/cloudwatch/ . In the navigation pane, choose Logs groups. Under the Log Groups table, choose a log group of the API-Gateway-Execution-Logs_{rest-api-id}/{stage-name} name. Under the Log Streams table, choose a log stream.

How do you check errors in CloudWatch Logs?

Via the CloudWatch logs, find the log group for the function, and click Search Events. Set the date/time to a little bit before the error occurred. Enter a keyword you think will appear in the error.


1 Answers

Based on the information provided in the comments by Marcin, I've found this info in the aws documentation:

The Average statistic represents the 5XXError error rate, namely, the total count of the 5XXError errors divided by the total number of requests during the period. The denominator corresponds to the Count metric (below).

My alarm configured in terraform looks as follow:

resource "aws_cloudwatch_metric_alarm" "gateway_error_rate" {
  alarm_name          = "gateway-errors"
  comparison_operator = "GreaterThanOrEqualToThreshold"
  alarm_description   = "Gateway error rate has exceeded 5%"
  treat_missing_data  = "notBreaching"
  metric_name         = "5XXError"
  namespace           = "AWS/ApiGateway"
  period              = 60
  evaluation_periods  = 2
  threshold           = 0.05
  statistic           = "Average"
  unit                = "Count"

  dimensions = {
    ApiName = "my-api"
    Stage = "dev"
  }
}
like image 162
Jaime S Avatar answered Sep 17 '22 15:09

Jaime S