Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Get buckets average of a date_histogram, elasticsearch

I have the following query where get the a data and I create an aggregation of each past hour:

    query = {
        "query": {
            "bool": {          
                "must": [
                    { "term": {"deviceId":device} },
                    { "match": {"eventType":"Connected"} } 
                ],
                "must_not":[{
                        "query_string": {
                            "query": "Pong",
                            "fields": ["data.message"]
                        }
                    },
                ] 
            },

        },
        "size": 0,
        "sort": [{ "timestamp": { "order": "desc" }}],
        "aggs" : {
            "time_buckets" : {
                "date_histogram" : {
                    "field" : "timestamp",
                    "interval" : "hour",

                },
            }
        }
    }

I would like to get the average of a field from each hour interval (each bucket created by the aggregation). In this article they talk about something similar with what I wish to do: http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/_looking_at_time.html ("What was the average latency of our website every hour in the last week?"). However, they don't explain exactly what to do in this case.

Does anyone know how to do that?

like image 246
Joab Mendes Avatar asked Feb 12 '15 22:02

Joab Mendes


People also ask

What is multi bucket histogram in Elasticsearch?

This multi-bucket aggregation is similar to the normal histogram, but it can only be used with date or date range values. Because dates are represented internally in Elasticsearch as long values, it is possible, but not as accurate, to use the normal histogram on dates as well.

Can I use histogram on date in Elasticsearch?

Date Histogram Aggregationedit. This multi-bucket aggregation is similar to the normal histogram, but it can only be used with date values. Because dates are represented internally in Elasticsearch as long values, it is possible, but not as accurate, to use the normal histogram on dates as well.

Is the date histogram a bucket or bucket aggregation?

Note that the date histogram is a bucket aggregation and the results are returned in buckets.

How are timestamps returned from a bucket in Elasticsearch?

These timestamps are returned as the key name of the bucket. The key_as_string is the same timestamp converted to a formatted date string using the format parameter specification: If you don’t specify format, the first date format specified in the field mapping is used. Elasticsearch stores date-times in Coordinated Universal Time (UTC).


1 Answers

Just realized that I could do a nested aggregation and then calculate the average of a field inside a aggregation. Here is what I did and it's working properly now:

 query = {
            "query": {
                "bool": {          
                    "must": [
                        { "term": {"deviceId":device} },
                        { "match": {"eventType":"Connected"} } 
                    ],
                    "must_not":[{
                            "query_string": {
                                "query": "Pong",
                                "fields": ["data.message"]
                            }
                        },
                    ] 
                },

            },
            "size": 0,
            "sort": [{ "timestamp": { "order": "desc" }}],
            "aggs" : {
                "time_buckets" : {
                    "date_histogram" : {
                        "field" : "timestamp",
                        "interval" : "day"
                    },
                    "aggs" : {
                        "avg_battery" : {
                            "avg": { "field": "data.battery-level" } 
                        }
                    }
                }
            }
        }
like image 171
Joab Mendes Avatar answered Oct 11 '22 23:10

Joab Mendes