CloudWatch does not aggregate across dimensions for your custom metrics

Tags:

Reading the docs I saw this statement;

CloudWatch does not aggregate across dimensions for your custom metrics

That seems like a HUGE limitation right? It would make custom metrics all but useless in my estimation- so I want to confirm I'm understanding this.

For example say I had a custom metric I shipped from multiple servers. I want to see per server but I also want to see them all together. I would have no way of aggregating that accross all the servers? Or would i be forced to create two custom metrics, one for single server and one for all server and double post metrics from the servers to the per server one AND the one for aggregating all of them?

537

asked Jan 25 '18 13:01

red888

1 Answers

The docs are correct, CloudWatch won't aggregate across dimensions for your custom metrics (it will do so for some metrics published by other services, like EC2).

This feature may seem useful and clear for your use-case but it's not clear how such aggregation would behave in a general case. CloudWatch allows for up to 10 dimensions so aggregating for all combinations of those may result in a lot of useless metrics, for all of which you would be billed. People may use dimensions to split their metrics between Test and Prod stacks for example, which are completely separate and aggregating those would not make sense.

CloudWatch is treating a metric name plus a full set of dimensions as a unique metric identifier. In your case, this means that you need to publish your observations for each metric you want it contributing to separately.

Let's say you have a metric named Latency, and you're putting a hostname in a dimension called Server. If you have three servers this will create three metrics:

Latency, Server=server1
Latency, Server=server2
Latency, Server=server3

So the approach you mentioned in your question will work. If you also want a metric showing the data across all servers, each server would need to publish to a separate metric, which would be best to do by using a new common value for the Server dimension, something like AllServers. This will result in you having 4 metrics, like this:

Latency, Server=server1 <- only server1 data
Latency, Server=server2 <- only server2 data
Latency, Server=server3 <- only server3 data
Latency, Server=AllServers <- data from all 3 servers

Update 2019-12-17

Using metric math SEARCH function: https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/using-metric-math.html

This will give you per server latency and latency across all servers, without publishing a separate AllServers metric and if a new server shows up, it will be automatically picked up by the expression:

Graph source:

{
    "metrics": [
        [ { "expression": "SEARCH('{SomeNamespace,Server} MetricName=\"Latency\"', 'Average', 60)", "id": "e1", "region": "eu-west-1" } ],
        [ { "expression": "AVG(e1)", "id": "e2", "region": "eu-west-1", "label": "All servers", "yAxis": "right" } ]
    ],
    "view": "timeSeries",
    "stacked": false,
    "region": "eu-west-1"

}

Result will be a graph like this:

search expression

Downsides of this approach:

Expressions are limited to 100 metrics.
Overall aggregation is limited to available metric math functions, which means percentiles are not available as of 2019-12-17.

Using Contributor Insights (open preview as of 2019-12-17): https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/ContributorInsights.html

If you publish your logs to CloudWatch Logs in JSON or Common Log Format (CLF), you can create rules that keep track of top contributors. For example, a rule that keeps track servers with latencies over 400 ms would look something like this:

{
    "Schema": {
        "Name": "CloudWatchLogRule",
        "Version": 1
    },
    "AggregateOn": "Count",
    "Contribution": {
        "Filters": [
            {
                "Match": "$.Latency",
                "GreaterThan": 400
            }
        ],
        "Keys": [
            "$.Server"
        ],
        "ValueOf": "$.Latency"
    },
    "LogFormat": "JSON",
    "LogGroupNames": [
        "/aws/lambda/emf-test"
    ]
}

Result is a list of servers with most datapoints over 400 ms:

enter image description here

Bringing it all together with CloudWatch Embedded Format: https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/CloudWatch_Embedded_Metric_Format.html

If you publish your data in CloudWatch Embedded Format you can:

Easily configure dimensions, so you can have per server metrics and overall metric if you want.
Use CloudWatch Logs Insights to query and visualise your logs.
Use Contributor Insights to get top contributors.

126

answered Sep 20 '22 20:09

Dejan Peretin

Related questions
                            
                                Max AWS SQS Queues
                            
                                Can I add custom log files to the logs captured by elastic beanstalk's 'eb logs' command?
                            
                                How to make a HTTP call reaching all instances behind amazon AWS load balancer?
                            
                                Can I use AWS route 53 and Cloudflare at the same time?
                            
                                How to publish sns to a specific endpoint?
                            
                                Write R data as csv directly to s3
                            
                                Using Java to establish a secure connection to MySQL Amazon RDS (SSL/TLS)
                            
                                What's the difference between AWS SSO and AWS Cognito?
                            
                                AWS load balancer and maintenance page
                            
                                aws apigateway import-rest-api returns "Invalid base64" error
                            
                                Amazon Ec2 FTP Write Permission [closed]
                            
                                Error registering: NoCredentialProviders: no valid providers in chain ECS agent error
                            
                                boto3 sessions and aws_session_token management
                            
                                Determine IP Address of Client Behind Amazon ELB
                            
                                How do you handle Amazon Kinesis Record duplicates?
                            
                                Denying a Sign-up request in Cognito User Pools
                            
                                self referencing aws security groups
                            
                                How to pass values to a Lambda function in CloudWatch?
                            
                                Amazon Athena Convert String to Date
                            
                                AWS Lambda function REST API end point - 403 error

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

CloudWatch does not aggregate across dimensions for your custom metrics

Tags:

amazon-web-services

amazon-cloudwatch

amazon-cloudwatch-metrics

red888

People also ask

1 Answers

Update 2019-12-17

Dejan Peretin

Recent Activity

Donate For Us