I'm running an AWS EC2 m5.large (a none burstable instance). I have setup one of AWS CloudWatch's default metrics (CPU %) + some custom metrics (memory + disk usage) in my dashboard.
But when I compare the numbers CloudWatch report to me they are pretty far from then actually usage of the Ubuntu 20.04 server when I log in to it...
Actual usage:
CPU: ~ 35 %
Memory: ~ 33 %
CloudWatch report:
CPU ~ 10 %
Memory: ~ 50-55
https://www.screencast.com/t/o1nAnOFjVZW
I have followed AWS own instructions to add the metrics for memory and disk usage (Because CloudWatch doesn't out of the box have access to O/S level stuff): https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/mon-scripts.html
When numbers are so far from each other - then it would be impossible to setup useful alarms and notifications. I can't believe that is what AWS wants to provide to the people who chose to followed their original instructions? The only thing with match exactly is the disk usage %.
https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/download-cloudwatch-agent-commandline.html
1. sudo wget https://s3.amazonaws.com/amazoncloudwatch-agent/debian/amd64/latest/amazon-cloudwatch-agent.deb
2. sudo dpkg -i -E ./amazon-cloudwatch-agent.deb
3. sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-config-wizard
4. Go through all the steps in the wizard (The result is saved here: /opt/aws/amazon-cloudwatch-agent/bin/config.json)
Hint: I answered:
- Default to most questions and otherwise:
- NO --> Do you want to store the config in the SSM parameter store? (Because when I answered YES it failed later on because of some permission-issue and I didn't know how to make it happy and I don't think I need SSM in regards to this)
- YES --> Do you want to turn on StatsD daemon?
- YES --> Do you want to monitor metrics from CollectD?
- NO --> Do you have any existing CloudWatch Log Agent?
Now to prevent this error: Error parsing /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.toml, open /usr/share/collectd/types.db: no such file or directory https://github.com/awsdocs/amazon-cloudwatch-user-guide/issues/1
5. sudo mkdir -p /usr/share/collectd/
6. sudo touch /usr/share/collectd/types.db
7. sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -a fetch-config -m ec2 -c file:/opt/aws/amazon-cloudwatch-agent/bin/config.json -s
8. /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -m ec2 -a status
{
"status": "running",
"starttime": "2020-06-07T10:04:41+00:00",
"version": "1.245315.0"
}
I realized - that the second I login to the machine the CPU % usage goes up from 10 % to 30% and stays there (of course some increase was to be expected - but not that much in my opinion) which in my case explains the big difference earlier...I honestly don't now if this way in more precise than the older script - but this should be the right way to do it in year 2020 :-) And you get access to 179 custom metrics when selecting "Advanced" during the wizard (even though only few would be valuable to most people)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With