Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to do a year over year aggregation with Elasticsearch?

Assuming I have a date field on a document, I know using the date_histogram aggregation I can get a document count by day, month, year, etc.

What I want to do is get the average document count for January, February, March, etc. over several given years. The same goes for Monday, Tuesday, Wednesday, etc. over several given weeks. Is there a way to do this having just that same date field or what is the best way to accomplish this with Elasticsearch?


Example

Let's say we have a bunch of orders placed over three years:

  • 2012 - Jan (10 orders), Feb (5 orders), Mar (7 orders), Apr (11 orders), etc
  • 2013 - Jan (13 orders), Feb (7 orders), Mar (12 orders), Apr (15 orders), etc.
  • 2014 - Jan (10 orders), Feb (7 orders), Mar (6 orders), Apr (13 orders), etc.

What I want is the average of each month over the given years, so the output would be:

Jan (10 + 13 + 10 / 3 = 11 orders), Feb (6.33 orders), Mar (8.33 orders), Apr (13 orders), etc.

It would be best if this can be generalized for N years (or N Januaries, etc.) so that we search over any date range.

like image 254
nascosto Avatar asked Sep 29 '22 20:09

nascosto


1 Answers

You can use 'monthOfYear' like this:

"aggregations": {
    "timeslice": {
        "histogram": {
            "script": "doc['timestamp'].date.getMonthOfYear()",
            "interval": 1,
            "min_doc_count": 0,
            "extended_bounds": {
                "min": 1,
                "max": 12
            },
            "order": {
                "_key": "desc"
            }
        }
    }

The extended bounds will ensure you get a value for every month (even if it is zero).

If you want the month names, you can either do that in your own code, or, do this (at the consequence that you won't get values for months that have no data):

"aggregations": {
    "monthOfYear": {
        "terms": {
            "script": "doc['timestamp'].date.monthOfYear().getAsText()",
            "order": {
                "_term": "asc"
            }
        }
    }

Once you've got this, you can nest your stats aggregation inside this one:

"aggregations: {
    "monthOfYear": {
        "terms": {
            ...
        },
        "aggregations": {
            "stats": ...
        }
    }
 }

The question is pretty old now, but, hope this helps someone.

like image 178
RichS Avatar answered Oct 11 '22 12:10

RichS