Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Elasticsearch filter document group by field

I have some documents:

{"name": "John", "district": 1},
{"name": "Mary", "district": 2},
{"name": "Nick", "district": 1},
{"name": "Bob", "district": 3},
{"name": "Kenny", "district": 1}

How can I filter/select distinct documents by district?

{"name": "John", "district": 1},
{"name": "Mary", "district": 2},
{"name": "Bob", "district": 3}

In SQL, I can use GROUP BY. I tried terms aggregation but it only returned count distinct.

"aggs": {
  "distinct": {
    "terms": {
      "field": "district",
      "size": 0
    }
  }
}

Thank for your help! :-)

like image 577
Kevin Avatar asked Sep 23 '14 03:09

Kevin


1 Answers

If your ElasticSearch version is 1.3 or above, you could use a subaggregation of type top_hits which will give you (by default) the top three matching documents sorted on your query score (here, 1 as you use a match_all query).

You can set the size parameter to more than 3.

The following dataset and query :

POST /test/districts/
{"name": "John", "district": 1}

POST /test/districts/
{"name": "Mary", "district": 2}

POST /test/districts/
{"name": "Nick", "district": 1}

POST /test/districts/
{"name": "Bob", "district": 3}

POST test/districts/_search
{
  "size": 0, 
  "aggs":{
    "by_district":{
      "terms": {
        "field": "district",
        "size": 0
      },
      "aggs": {
        "tops": {
          "top_hits": {
            "size": 10
          }
        }
      }
    }
  }
}

Will output the documents the way you want :

{
   "took": 5,
   "timed_out": false,
   "_shards": {
      "total": 5,
      "successful": 5,
      "failed": 0
   },
   "hits": {
      "total": 4,
      "max_score": 0,
      "hits": []
   },
   "aggregations": {
      "by_district": {
         "buckets": [
            {
               "key": 1,
               "key_as_string": "1",
               "doc_count": 2,
               "tops": {
                  "hits": {
                     "total": 2,
                     "max_score": 1,
                     "hits": [
                        {
                           "_index": "test",
                           "_type": "districts",
                           "_id": "XYHu4I-JQcOfLm3iWjTiOg",
                           "_score": 1,
                           "_source": {
                              "name": "John",
                              "district": 1
                           }
                        },
                        {
                           "_index": "test",
                           "_type": "districts",
                           "_id": "5dul2XMTRC2IpV_tKRRltA",
                           "_score": 1,
                           "_source": {
                              "name": "Nick",
                              "district": 1
                           }
                        }
                     ]
                  }
               }
            },
            {
               "key": 2,
               "key_as_string": "2",
               "doc_count": 1,
               "tops": {
                  "hits": {
                     "total": 1,
                     "max_score": 1,
                     "hits": [
                        {
                           "_index": "test",
                           "_type": "districts",
                           "_id": "I-9Gd4OYSRuexhP1dCdQ-g",
                           "_score": 1,
                           "_source": {
                              "name": "Mary",
                              "district": 2
                           }
                        }
                     ]
                  }
               }
            },
            {
               "key": 3,
               "key_as_string": "3",
               "doc_count": 1,
               "tops": {
                  "hits": {
                     "total": 1,
                     "max_score": 1,
                     "hits": [
                        {
                           "_index": "test",
                           "_type": "districts",
                           "_id": "bti2y-OUT3q2mBNhhI3xeA",
                           "_score": 1,
                           "_source": {
                              "name": "Bob",
                              "district": 3
                           }
                        }
                     ]
                  }
               }
            }
         ]
      }
   }
}
like image 143
ThomasC Avatar answered Oct 05 '22 15:10

ThomasC