Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Aggregate logs by field value and plot as multiple series using AWS CloudWatch Insights

Tags:

Analysing some log files using AWS CloudWatch Insights, I can plot a count aggregated in time bins with:

| stats count(*) by bin(1h) 

This produces a graph, as expected, aggregating all logs in each time bin.

I want to split this data by a 'group' field, with values A and B.

| stats count(*) by group, bin(1h) 

This returns log counts across time bins as expected, but the visualisation tab says 'No visualisation available.' I would like it to return a time series plot with a series for group A and a series for group B.

Where am I going wrong, or is this simply not possible?

like image 681
Tom Dufall Avatar asked Jul 31 '19 11:07

Tom Dufall


People also ask

How do I use aggregate logs in CloudWatch?

To run a query with an aggregation functionIn the navigation pane, choose Logs, and then choose Logs Insights. In the Select log group(s) drop down, choose one or more log groups to query. You can enter the name of log groups that you want to query in the search bar.

How do you visualize CloudWatch logs?

To see the visualization, run your query. Then choose the Visualization tab, select the arrow next to Line, and choose Bar. Visualizations are limited to up to 100 bars in the bar chart. For a complete tutorial, see Tutorial: Run a query that produces a visualization grouped by log fields.


2 Answers

Alright, I found a very janky way to solve this. It appears that no, you cannot do

| stats count(*) by group, bin(1h) 

However, you can use parse to artificially create new variables, like so:

parse "[E*]" as @error | parse "[I*]" as @info | parse "[W*]" as @warning | filter ispresent(@warning) or ispresent(@error) or ispresent(@info) | stats count(@error) as error, count(@info) as info, count(@warning) as warning by bin(15m) 

Here, I'm trying to look at the distribution of log types over time. I have three types of log messages with formats: "[ERROR]", "[INFO]" and "[WARNING]"

like image 151
smth Avatar answered Sep 19 '22 13:09

smth


Expanding on the sample of @smth, i usually do it a little different,

with this query i follow the trend of status codes aggregated over time on a standard nginx access log

fields @timestamp, @message | parse @message '* - * [*] "* * *" * * "-" "*"' as host, identity, dateTimeString, httpVerb, url, protocol, status, bytes, useragent | stats count (*) as all, sum ( status < 299 ) as c_s200, sum ( status > 299 and status < 399 ) as c_s300, sum ( status > 399 and status < 499 ) as c_s400, sum ( status > 499 ) as c_s500 by bin (1m)  

The trick on this is that expressions like "status > 499" returns 0 if false and 1 if true, and so, adding it up on the time bucket allows to simulate something like a 'count if [condition]'

And so does the sample generated graph look like, on the visualisation tab.

amount of each stausCode aggregated on 1 minute buckets

like image 41
6ugr3 Avatar answered Sep 21 '22 13:09

6ugr3