Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Elasticsearch: How to search, sort, limit the results then sort again?

This isn't about multi-level sorting.

I need my results first selected by distance, limited to 50, then those 50 sorted by price.

select * 
from
(
select top 50 * from mytable order by distance asc)
)
order by price asc

Essentially, the second sort throws away the ordering of the inner sort - but the inner sort is used to hone in on the top 50 results.

The other answers I've seen for this sort of question looks at second-level sorting, which is not what I'm after.

BTW: I've looked at aggregations - Top N results, but I'm not sure I can apply a sort on the aggregation result sort. Also looked at rescore, but I don't know where to put my 'sorts'

like image 346
rpsingh Avatar asked Oct 11 '16 00:10

rpsingh


People also ask

How does Elasticsearch sort work?

Elasticsearch comes with a good default out of the box. It sorts the results by relevance to the search query term, most relevant first. Elasticsearch measures the relevance score as a floating-point number called _score, and orders results in the descending order of their _score values.


2 Answers

A top hits aggregation will allow you to sort on a separate field, in your case price from the main query sort (on distance). See the documentation here for how to specify sorting in the top hits agg.

It'll look a little like this (which assumes distance is a double type; if it's a geo-location type, use the documentation provided by Volodymyr Bilyachat.)

{
   "sort":[
      {
         "distance":"asc"
      }
   ],
   "query":{
      "match_all":{}
   },
   "size":50,
   "aggs":{
      "top_price_hits":{
         "top_hits":{
            "sort":[
               {
                  "price":{
                     "order":"asc"
                  }
               }
            ],
            "size":50
         }
      }
   }
}

However, if there are only 50 results that you want from your primary query, why don't you just sort in the application client side? This would be a better approach as using a top hits aggregation for a secondary sort is a slight abuse of its purpose.

The in-application approach would be more robust.

like image 74
Phil Avatar answered Nov 15 '22 07:11

Phil


+1'ed the accepted answer, but I wanted to make sure you were aware of how search scoring, can often deliver a better user experience than traditional sorting.

Based on your current strategy, one could say:

  1. Distance is important, relatively speaking (e.g. top 50 closest) but not in absolute terms (e.g. must be within 50mi).

  2. You only want to show 50 results.

  3. You want those results to be sorted by price (or perhaps alphabetically).


However, if you find yourself trying to generalize about which result a searcher is most likely to choose, you may discover a function of price and distance (or other features) which better models the real-world likelihood of a searcher choosing a particular result.

E.g. Say you discover that

  • Users will pay more for the convenience of a nearby result
  • Users will travel greater distances for greater discounts

Then you could model a sample scoring function that generates a result ordering based on this relationship.

E.g. 1/price + 1/distance ... which would generate a higher score as either price or distance decreased.

Which could be generalized to P * 1/price + 1/distance where P represented a tuning coefficient expressing the relative importance of price vs distance.


Armed with this model, you could then write a function score query which would output ordered results with the optimal combinations of price and distance for your users.

like image 42
Peter Dixon-Moses Avatar answered Nov 15 '22 07:11

Peter Dixon-Moses