Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Elasticsearch sort: by one from each group and repeat

I need to get items with max value from each name and repeat until end.

I'll explain it on simple example. I have such items:

Name| Value
-----------
AAA | 12
AAA | 35
AAA | 5
BBB | 1
BBB | 10
BBB | 5

Expected result after sort:

Name| Value
-----------
AAA | 35
BBB | 10
AAA | 12
BBB | 5
AAA | 5
BBB | 1

I know how to do it in Postgres (window functions: rank() over()), but is it possible in Elastic?

like image 942
Vitalii Ponomar Avatar asked Jul 09 '18 05:07

Vitalii Ponomar


2 Answers

You have to do something like Group by max

Here is Example

GET /yourindex/_search
{
"size": 0
  "aggs": {
    "yourGroup": {
      "terms": {
        "field": "Name",
        "size": 10
      },
      "aggs": {
        "theMax": {
          "max": {
            "field": "Value"
          }
        }
      }
    }
  }
}

Reference:- this

like image 73
Mihir Dave Avatar answered Sep 27 '22 17:09

Mihir Dave


Aggregating here my comments.

To answer your direct question: no, not possible to my knowledge. But there are workarounds where Elasticsearch could help.

Showing >1 million records is a bad idea no matter how those documents are sorted, when it comes to Elasticsearch. My questions in comments were asked to see how appropriate is to create a second ES index with the results of probably 1 query + post processing and holding something like "first 1000 records" (meaning a human reasonable list of documents) and to update that list periodically (every 10 seconds or so). You could use Watcher to build this index and keep it updated. 1 million records, as I said, is both impractical (who would look at 1mil docs) and not performant from ES point of view.

Basically, keep a separate index which should include only first 1000 documents that are sorted according to your requirements. And this index is updated regularly, not your main one with 1 mil documents. Regarding pagination and 1 mil. documents... how many pages do you believe your users will go through?! 10, 15, 20? Not even google.com is giving you everything. Only few tens of pages, even though there can be hundreds of millions of matches. Keep in mind that Elasticsearch is a search engine, not a database. The aim is to give you the best matching docs, not all of them.

The query from Watcher will run over all the documents in your main index. It will aggregate the documents according to your requirements (I think a terms aggregation on Name, ordered by Value), you can add a post-processing step to create the order you need and then index that into a second index. Next time the watch will trigger, it will delete the old index, perform the same query again and index the new results in the (now empty) index.

like image 32
Andrei Stefan Avatar answered Sep 27 '22 17:09

Andrei Stefan