Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is it possible to group results by a key with Elasticsearch aggregations?

I'm trying to perform an Elasticsearch query and would like for Elasticsearch to group the results for me, instead of having my client code do it manually. Looking at the Elasticsearch documentation, it appears like bucketing aggregation would be what I'm looking for, but I can't find any examples that use it, or what the output would look like to be sure that's what I want.

My question is: is it possible to group documents by a key in Elasticsearch? If so, how and where can I find documentation on how to do it, either using the query DSL or (preferably) the Javadoc for the Java API?

like image 731
Eric Hydrick Avatar asked Aug 08 '14 21:08

Eric Hydrick


People also ask

Is Elasticsearch good for aggregations?

Elasticsearch Aggregations provide you with the ability to group and perform calculations and statistics (such as sums and averages) on your data by using a simple search query. An aggregation can be viewed as a working unit that builds analytical information across a set of documents.

How does aggregation work in Elasticsearch?

Elasticsearch organizes aggregations into three categories: Metric aggregations that calculate metrics, such as a sum or average, from field values. Bucket aggregations that group documents into buckets, also called bins, based on field values, ranges, or other criteria.

What is sub aggregation in Elasticsearch?

The sub-aggregations will be computed for the buckets which their parent aggregation generates. There is no hard limit on the level/depth of nested aggregations (one can nest an aggregation under a "parent" aggregation, which is itself a sub-aggregation of another higher-level aggregation).


1 Answers

I guess you are trying to group by a field in elasticsearch, you can do it by using Terms aggregation.

Here is how to do using query dsl,

POST _search 
{
   "aggs": {
      "genders": {
         "terms": {
            "field": "gender"
         },
         "aggs": {
            "top_tag_hits": {
               "top_hits": {
                  "_source": {
                     "include": [
                        "include_fields_name"
                     ]
                  },
                  "size": 100
               }
            }
         }
      }
   }
}

and gender is field in document, Its response can be

{
    ...

    "aggregations" : {
        "genders" : {
            "buckets" : [
                {
                    "key" : "male",
                    "doc_count" : 10,
                    "tag_top_hits":{"hits":...}
                },
                {
                    "key" : "female",
                    "doc_count" : 10,
                    "tag_top_hits":{"hits":...}
                },
            ]
        }
    }
}

Using Java api, I've added tophits aggregation for your comment. (but not in query dsl)

client.prepareSearch("index").setTypes("types").addAggregation(
                AggregationBuilders.terms("agg_name").field("gender").subAggregation(
                        AggregationBuilders.topHits("documents").setSize(10)
                )
        ).execute(new ActionListener<SearchResponse>() {
            @Override
            public void onResponse(SearchResponse response) {
                Terms agg_name_aggregation=response.getAggregations().get("agg_name");
                for (Terms.Bucket bucket : agg_name_aggregation.getBuckets()) {
                    TopHits topHits=bucket.getAggregations().get("documents");
                    System.out.println("term = " + bucket.getKey());
                    // do what you want with top hits..
                }
            }

            @Override
            public void onFailure(Throwable e) {
                e.printStackTrace();
            }
        });

Hope this helps!!

like image 56
progrrammer Avatar answered Nov 14 '22 21:11

progrrammer