Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to listen for an "Insufficient cpu/memory" event in an AWS ECS service?

I am interested in listening/reacting to the event, that a service cannot start start a task because of insufficient cpu or memory. This information can be viewed in the console, if i chose the specific service and look in its "Events" tab. There, an event like the following would be displayed:

"service X was unable to place a task because no container instance met all of its requirements. The closest matching container-instance Y has insufficient CPU units available. For more information, see the Troubleshooting section."

The container instances in the cluster are managed in an AutoScalingGroup, so the appropriate action would be to react to this event, by scaling in an additional instance, which would then allow the task to be scheduled to run. Now, my problem is, how do i react to this event?

I have a LogGroup that contains data from the following files from all the EC2 instances in the cluster:

  • /var/log/dmesg
  • /var/log/messages
  • /var/log/docker
  • /var/log/ecs/ecs-init.log.*
  • /var/log/ecs/ecs-agent.log.*

(The EC2 instances are based on amazon-ecs-optimized images)

Initially, i thought i that the "service X was unable to place a task..." message would appear in one of these log files (more specifically in the ecs-agent.log or ecs-init.log), but that was not the case.

I then realized that "ECS Evenets" is a thing (see more at http://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs_cwe_events.html). But unfortunately, this specific event, is not one that is supported by the "ECS Events". Only: Container Instance State Change Events and Task State Change Events. NOT "Service State Change Events". Even though, one would think that the events from the "Events" tab in the service would be streamed as well, they are not. I came to realize the documentation even says that:

"You can use Amazon ECS event stream for CloudWatch Events to receive near real-time notifications regarding the current state of both the container instances within an Amazon ECS cluster, and the current state of all tasks running on those container instances."

And thereby, "Amazon ECS Event Stream for CloudWatch Events" is not steaming service events (and thereby not events for tasks that are prevented from running). I really hope that "Service State Change Events" would be included in the future, that way i could make a CloudWatch Event Rule that matches this event, triggers a Lambda function which would then determine if the event was an event of type "service X was unable to place a task...", and based on that, manipulate the AutoScalingGroup to scale in an additional instance to the cluster.

But as stated, this is not supported at the moment. Is there any other way that i can "listen" for this event? I even thought about running a lambda every 2-3 minutes that uses the CLI to invoke "aws ecs describe-services --service X" to output the list of events, and then match on the "service X was unable to place a task..." event. But that just seems wrong...

Any help is very appreciated. Thanks!

like image 374
Frederik Nygaard Svendsen Avatar asked Feb 22 '17 14:02

Frederik Nygaard Svendsen


1 Answers

I am able to get these events with Cloudwatch event rule. I have created a Cloudwatch event rule with a target and using "Part of the matched event" with below pattern to filter these events.

$.detail.responseElements.service.events

These events are in the AWS API Call Via Cloudtrail.

Steps:

  1. Create an event rule with below configurations of event pattern.

    Service Name: ECS Event Type: AWS API Call Via Cloudtrail

  2. Choose Target: SNS

  3. Configure input - Part of the matched event -> Enter $.detail.responseElements.service.events into the box.

  4. Create a rule.

JSON After Filtering Events is like:

[
  {
    "id": "fb7dbb37-ff2a-443c-b414-1ead7276f550",
    "createdAt": "Oct 18, 2018 7:24:16 AM",
    "message": "(service sample) has reached a steady state."
  },
  {
    "id": "598dbdc0-e1b5-4673-8d5c-0b531d349789",
    "createdAt": "Oct 18, 2018 1:24:11 AM",
    "message": "(service sample) has reached a steady state."
  },
  {
    "id": "5aa89799-c661-4f6c-bbf0-8e7c93dfa31e",
    "createdAt": "Oct 17, 2018 7:24:04 PM",
    "message": "(service sample) has reached a steady state."
  },
  {
    "id": "db535112-786d-4090-9855-147a7301761b",
    "createdAt": "Oct 17, 2018 1:23:34 PM",
    "message": "(service sample) has reached a steady state."
  },
  {
    "id": "15e4b4d7-8cb7-4fd7-b616-bec0fdbc5e6c",
    "createdAt": "Oct 17, 2018 1:01:35 PM",
    "message": "(service sample) was unable to place a task because no container instance met all of its requirements. The closest matching (container-instance 05016874-f518-4b7a-a817-eb32a4d387f1) has insufficient memory available. For more information, see the Troubleshooting section of the Amazon ECS Developer Guide."
  },
  {
    "id": "f744736c-6213-40bc-aee4-9e928f9be263",
    "createdAt": "Oct 17, 2018 1:01:26 PM",
    "message": "(service sample) has started 1 tasks: (task 3af3f916-1d6f-4543-a179-c2b06da8487e)."
  },
  {
    "id": "3af31b15-1386-4fd5-be80-42b7e4cdce54",
    "createdAt": "Oct 17, 2018 12:51:35 PM",
    "message": "(service sample) was unable to place a task because no container instance met all of its requirements. The closest matching (container-instance 05016874-f518-4b7a-a817-eb32a4d387f1) has insufficient memory available. For more information, see the Troubleshooting section of the Amazon ECS Developer Guide."
  }
]
like image 151
mohit Avatar answered Oct 23 '22 22:10

mohit