Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

AWS EC2 High CPU alarms going off

Tags:

I have a micro EC2 instance running windows 2008 R2. I've been getting a lot of high CPU alarms going off lately, and when I log into the AWS management console I see that my CPU is practically pegged at 100%. However, if I log into the instance and pull up task manager, my CPU looks like it's practically idling. I've left task manager open for a while and took this screenshot showing the differences between was AWS is reporting and what my instance looks like it's doing. Suggestions?

CPU Usage Graph (https://s3.amazonaws.com/caskerdbbucket/public/cpu.png)

PS: the update speed on task manager is set to "Low"

like image 290
Ryan Caskey Avatar asked Jul 28 '12 17:07

Ryan Caskey


1 Answers

The data exposed by the operating system is often insufficient or misleading in virtualized environments like Amazon EC2, and the reported percentage depends on your instance type and the under­ly­ing proces­sor core utilization (which usually doesn't match the virtualized hardware you are presented with from the hypervisor), amongst other things - what you are seeing is most likely caused by respective CPU steal time as exposed in most related Unix/Linux monitoring tools nowadays (but not on Windows, unfortunately, see my question Is there a Windows equivalent of Unix 'CPU steal time'? for more regarding this problem) - see e.g. columns %steal or st in sar or top:

st -- Steal Time
The amount of CPU 'stolen' from this virtual machine by the hypervisor for other tasks (such as running another virtual machine).

The blog post EC2 monitoring: the case of stolen CPU provides a nice exploration and illustration of this topic:

When the top com­mand dis­plays 40% CPU busy but Cloud­Watch says the server is maxed out at 100% — which side do you take? The answer is sim­ple (Cloud­Watch is cor­rect, top is not) [...]

CPU steal time is particularly prevalent for the EC2 instance type t1.micro you are using, which can get heavily throttled by definition (usually ~97% steal time!), see Micro Instances for an extensive explanation and illustration of the concept - specifically, section When the Instance Uses Its Allotted Resources states:

We expect your application to consume only a certain amount of CPU resources in a period of time. If the application consumes more than your instance's allotted CPU resources, we temporarily limit the instance so it operates at a low CPU level. If your instance continues to use all of its allotted resources, its performance will degrade. We will increase the time we limit its CPU level, thus increasing the time before the instance is allowed to burst again. [emphasis mine]

Accordingly, you might have outgrown the sustainable CPU usage profile for micro instances and either need to adjust your workload or switch to another instance type.

like image 178
Steffen Opel Avatar answered Sep 17 '22 15:09

Steffen Opel