Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get latest values for each group with an Elasticsearch query?

I have some documents indexed on Elasticsearch, looking like these samples:

{'country': 'France', 'collected': '2015-03-12', 'value': 20}
{'country': 'Canada', 'collected': '2015-03-12', 'value': 21}
{'country': 'Brazil', 'collected': '2015-03-12', 'value': 33}
{'country': 'France', 'collected': '2015-02-01', 'value': 10}
{'country': 'Canada', 'collected': '2015-02-01', 'value': 11}
{'country': 'Mexico', 'collected': '2015-02-01', 'value': 9}
...

I want to build a query that gets one result per country, getting only the ones with max(collected).

So, for the examples shown above, the results would be something like:

{'country': 'France', 'collected': '2015-03-12', 'value': 20}
{'country': 'Canada', 'collected': '2015-03-12', 'value': 21}
{'country': 'Brazil', 'collected': '2015-03-12', 'value': 33}
{'country': 'Mexico', 'collected': '2015-02-01', 'value': 9}

I realized I need to do aggregation on country, but I'm failing to understand how to limit the results on max(collected).

Any ideas?

like image 936
Elias Dorneles Avatar asked Mar 12 '15 17:03

Elias Dorneles


People also ask

How do I get more than 10 results in Elasticsearch?

If a search request results in more than ten hits, ElasticSearch will, by default, only return the first ten hits. To override that default value in order to retrieve more or fewer hits, we can add a size parameter to the search request body.

What is range query in Elasticsearch?

By default, Elasticsearch uses the date format provided in the <field> 's mapping. This value overrides that mapping format. For valid syntax, see format . If a format or date value is incomplete, the range query replaces any missing components with default values.

What is Elasticsearch relevance score?

Elasticsearch uses search relevance to score documents of a dataset. It returns an ordered list of data sorted by a relevance score. We can customize the score by adding and modifying variables that will shift the scale between precision and recall.


2 Answers

You can use a top_hits aggregation that groups on the country field, returns 1 doc per group, and orders the docs by the collected date descending:

POST /test/_search?search_type=count {     "aggs": {         "group": {             "terms": {                 "field": "country"             },             "aggs": {                 "group_docs": {                     "top_hits": {                         "size": 1,                         "sort": [                             {                                 "collected": {                                     "order": "desc"                                 }                             }                         ]                     }                 }             }         }     } } 
like image 178
Dan Tuffery Avatar answered Oct 05 '22 21:10

Dan Tuffery


For those like user1892775 who run into "Fielddata is disabled on text fields by default...", you can create a multi field (https://www.elastic.co/guide/en/elasticsearch/reference/current/multi-fields.html). So you might have mapping like:

"mapping": {
    "properties": {
      "country": {"type": "string", "fields": {"raw": {"type": "string", "index": "not_analyzed"}}}
}

Then your query would look like

POST /test/_search?search_type=count
{
    "aggs": {
    "group": {
        "terms": {
            "field": "country.raw"
        },
        "aggs": {
            "group_docs": {
                "top_hits": {
                    "size": 1,
                    "sort": [
                        {
                            "collected": {
                                "order": "desc"
                            }
                        }
                    ]
                }
            }
        }
    }
  }
}

(Note the use of country.raw)

like image 38
Christian Sepulveda Avatar answered Oct 05 '22 20:10

Christian Sepulveda