I have some documents indexed on Elasticsearch, looking like these samples:
{'country': 'France', 'collected': '2015-03-12', 'value': 20}
{'country': 'Canada', 'collected': '2015-03-12', 'value': 21}
{'country': 'Brazil', 'collected': '2015-03-12', 'value': 33}
{'country': 'France', 'collected': '2015-02-01', 'value': 10}
{'country': 'Canada', 'collected': '2015-02-01', 'value': 11}
{'country': 'Mexico', 'collected': '2015-02-01', 'value': 9}
...
I want to build a query that gets one result per country, getting only the ones with max(collected)
.
So, for the examples shown above, the results would be something like:
{'country': 'France', 'collected': '2015-03-12', 'value': 20}
{'country': 'Canada', 'collected': '2015-03-12', 'value': 21}
{'country': 'Brazil', 'collected': '2015-03-12', 'value': 33}
{'country': 'Mexico', 'collected': '2015-02-01', 'value': 9}
I realized I need to do aggregation on country
, but I'm failing to understand how to limit the results on max(collected)
.
Any ideas?
If a search request results in more than ten hits, ElasticSearch will, by default, only return the first ten hits. To override that default value in order to retrieve more or fewer hits, we can add a size parameter to the search request body.
By default, Elasticsearch uses the date format provided in the <field> 's mapping. This value overrides that mapping format. For valid syntax, see format . If a format or date value is incomplete, the range query replaces any missing components with default values.
Elasticsearch uses search relevance to score documents of a dataset. It returns an ordered list of data sorted by a relevance score. We can customize the score by adding and modifying variables that will shift the scale between precision and recall.
You can use a top_hits
aggregation that groups on the country
field, returns 1 doc per group, and orders the docs by the collected date descending:
POST /test/_search?search_type=count { "aggs": { "group": { "terms": { "field": "country" }, "aggs": { "group_docs": { "top_hits": { "size": 1, "sort": [ { "collected": { "order": "desc" } } ] } } } } } }
For those like user1892775 who run into "Fielddata is disabled on text fields by default...", you can create a multi field (https://www.elastic.co/guide/en/elasticsearch/reference/current/multi-fields.html). So you might have mapping like:
"mapping": {
"properties": {
"country": {"type": "string", "fields": {"raw": {"type": "string", "index": "not_analyzed"}}}
}
Then your query would look like
POST /test/_search?search_type=count
{
"aggs": {
"group": {
"terms": {
"field": "country.raw"
},
"aggs": {
"group_docs": {
"top_hits": {
"size": 1,
"sort": [
{
"collected": {
"order": "desc"
}
}
]
}
}
}
}
}
}
(Note the use of country.raw)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With