Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a way to have elasticsearch return a hit per generated bucket during an aggregation?

right now I have a query like this:

{
    "query": {
        "bool": {
            "must": [
                {
                    "match": {
                        "uuid": "xxxxxxx-xxxx-xxxx-xxxxx-xxxxxxxxxxxxx"
                    }
                },
                {
                    "range": {
                        "date": {
                            "from": "now-12h",
                            "to": "now"
                        }
                    }
                }
            ]
        }
    },
    "aggs": {
        "query": {
            "terms": [
                {
                    "field": "query",
                    "size": 3
                }
            ]
        }
    }
}

The aggregation works perfectly well, but I can't seem to find a way to control the hit data that is returned, I can use the size parameter at the top of the dsl, but the hits that are returned are not returned in the same order as the bucket so the bucket results do not line up with the hit results. Is there any way to correct this or do I have to issue 2 separate queries?

like image 499
AgentRegEdit Avatar asked Mar 13 '14 05:03

AgentRegEdit


People also ask

What is top hit aggregation?

Top hits aggregationedit. A top_hits metric aggregator keeps track of the most relevant document being aggregated. This aggregator is intended to be used as a sub aggregator, so that the top matching documents can be aggregated per bucket.

What is composite aggregation in Elasticsearch?

A multi-bucket aggregation that creates composite buckets from different sources. Unlike the other multi-bucket aggregations, you can use the composite aggregation to paginate all buckets from a multi-level aggregation efficiently.

How does aggregation work in Elasticsearch?

Elasticsearch Aggregations provide you with the ability to group and perform calculations and statistics (such as sums and averages) on your data by using a simple search query. An aggregation can be viewed as a working unit that builds analytical information across a set of documents.


3 Answers

To expand on Filipe's answer, it seems like the top_hits aggregation is what you are looking for, e.g.

{
  "query": {
    ... snip ...
  },
  "aggs": {
    "query": {
      "terms": {
        "field": "query",
        "size": 3
      },
      "aggs": {
        "top": {
          "top_hits": {
            "size": 42
          }
        }
      }
    }
  }
}
like image 158
Shadocko Avatar answered Sep 26 '22 05:09

Shadocko


Your query uses exact matches (match and range) and binary logic (must, bool) and thus should probably be converted to use filters instead:

"filtered": {
 "filter": {
    "bool": {
       "must": [
          {
             "term": {
                "uuid": "xxxxxxx-xxxx-xxxx-xxxxx-xxxxxxxxxxxxx"
             }
          },
          {
             "range": {
                "date": {
                   "from": "now-12h",
                   "to": "now"
                }
             }
          }
       ]
    }
 }

As for the aggregations,

The hits that are returned do not represent all the buckets that were returned. so if have buckets for terms 'a', 'b', and 'c' I want to have hits that represent those buckets as well

Perhaps you are looking to control the scope of the buckets? You can make an aggregation bucket global so that it will not be influenced by the query or filter.

Keep in mind that Elasticsearch will not "group" hits in any way -- it is always a flat list ordered according to score and additional sorting options.

Aggregations can be organized in a nested structure and return computed or extracted values, in a specific order. In the case of terms aggregation, it is in descending count (highest number of hits first). The hits section of the response is never influenced by your choice of aggregations. Similarly, you cannot find hits in the aggregation sections.

If your goal is to group documents by a certain field, yes, you will need to run multiple queries in the current Elasticsearch release.

like image 32
BenG Avatar answered Sep 24 '22 05:09

BenG


I'm not 100% sure, but I think there's no way to do that in the current version of Elasticsearch (1.2.x). The good news is that there will be when version 1.3.x gets released:

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-metrics-top-hits-aggregation.html

like image 32
Filipe Avatar answered Sep 24 '22 05:09

Filipe