Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Elasticsearch exclude top hit on field value

{'country': 'France', 'collected': '2018-03-12', 'active': true}
{'country': 'France', 'collected': '2018-03-13', 'active': true}
{'country': 'France', 'collected': '2018-03-14', 'active': false}
{'country': 'Canada', 'collected': '2018-02-01', 'active': false}
{'country': 'Canada', 'collected': '2018-02-02', 'active': true}

Let's say I have this resultset, and I want to group them by country. After grouping them by country this will be the result:

{'country': 'France', 'collected': '2018-03-14', 'active': false}
{'country': 'Canada', 'collected': '2018-02-02', 'active': true}

But I want to exclude results where the last row active is false (the older rows of the same country can be true or false doesn't matter as long as the last row equals true), how can I do that in elasticsearch? Here is my query:

POST /test/_search?search_type=count
{
    "aggs": {
        "group": {
            "terms": {
                "field": "country"
            },
            "aggs": {
                "group_docs": {
                    "top_hits": {
                        "size": 1,
                        "sort": [
                            {
                                "collected": {
                                    "order": "desc"
                                }
                            }
                        ]
                    }
                }
            }
        }
    }
}
like image 383
Ismail Avatar asked Jul 16 '18 11:07

Ismail


People also ask

What is top_ hits in Elasticsearch?

A top_hits metric aggregator keeps track of the most relevant document being aggregated. This aggregator is intended to be used as a sub aggregator, so that the top matching documents can be aggregated per bucket.

What is Sum_other_doc_count?

sum_other_doc_count is the number of documents that didn't make it into the the top size terms.

What is term aggregation?

Definition of aggregation 1 : a group, body, or mass composed of many distinct parts or individuals A galaxy is an aggregation of stars and gas. 2a : the collecting of units or parts into a mass or whole.

What is Bucket aggregation in Kibana?

Bucket aggregations don't calculate metrics over fields like the metrics aggregations do, but instead, they create buckets of documents. Each bucket is associated with a criterion (depending on the aggregation type) which determines whether or not a document in the current context "falls" into it.


1 Answers

I think you can get away with sorting by two fields in your top_hits: by active and by collected. Basically, you want trues to be first and when equal, then sort by collected. Something like the following will always show the active:true documents sorted by collected.

The only downside to this solution is that if you don't have any active documents, top_hits will show one active:false document.

{
  "size": 0,
  "aggs": {
    "group": {
      "terms": {
        "field": "country"
      },
      "aggs": {
        "group_docs": {
          "top_hits": {
            "size": 1,
            "sort": [
              {
                "active": {
                  "order": "desc"
                }, 
                "collected": {
                  "order": "desc"
                }
              }
            ]
          }
        }
      }
    }
  }
}
like image 188
Andrei Stefan Avatar answered Oct 06 '22 00:10

Andrei Stefan