Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Elasticsearch sort by children

Two entities: collection and product. The collection is the parent of product.

I need to search by product's terms and show collections with 4 products each.

Collections and products can be partially matched, but the best match first. If the match is not full, some terms have priority.

Example: Searching "color:red" and "material:stone" need to show red stones first, any other red next (this is about collections match and product match).

So, all of this solved by request below:

{   "query": {     "has_child": {       "type": "products",       "query": {         "bool": {           "should": [             {               "constant_score": {                 "filter": {                   "match_all": {}                 },                 "boost": 1               }             },             {               "constant_score": {                 "filter": {                   "terms": { "_name": "colors", "colors": [5] }                 },                 "boost": 1.2               }             },             {               "constant_score": {                 "filter": {                   "terms": { "_name": "materials", "productTypes": [6] }                 },                 "boost": 1               }             }           ]         }       },       "score_mode": "max",       "inner_hits": {         "size": 4,         "sort": [           "_score"         ]       }     }   },   "sort": [     "_score"   ] } 

Ok, now the trouble.

Need to sort by price. As ASC, as DESC. Price is the property of the product.

Need to sort by the price of matched products, so can't move price to the collection. Need to sort by price as a collection as products. Collections sorted by minimal (or maximal) price of matched products.

Need to sort by price only 100% matched products (well, partially matched can be sorted too, but after). I mean, sort must be like ORDER BY _score, price

Example, that I want to get, sort by price asc, [nn] means partially matched product:

Collection1 100 - 200 - 800 - [99] Collection2 300 - 500 - [10] - [20] Collection3 400 - 450 - 500 - [100] 

I found that sort by the child is not supported. And the suggestion to recalculate score. But I'm using the score for sort by match. My try was

{   "query": {     "has_child": {       "type": "products",       "query": {         "function_score": {           "query": {             "bool": {               "should": [                 ... same query as above ...               ]             }           },           "functions": [             {               "script_score": {                 "script": "ceil(_score * 100) * 100000 + (99999 - doc['price'].value/100)",                 "lang": "expression"               }             }           ]         }       },       "score_mode": "max",       "inner_hits": {         "size": 4,         "sort": [           "_score",           {             "price": {               "order": "desc"             }           }         ]       }     }   },   "sort": [     "_score"   ] } 

But I'm really confused with results to score that I can see in answer. Asking for help :) Or, maybe, drop this and create a nested index?

UPD: Found that was wrong with score. By default, elastic combine score and result of script_score. So score was ceil(_score * 100) * 100000 + (99999 - doc['price'].value/100) * _score - that can broke idea, but easy to fix with boost_mode parameter of function_score. Result query:

{   "query": {     "has_child": {       "type": "products",       "query": {         "function_score": {           "query": {             "bool": {               "should": [                 ... same query as above ...               ]             }           },           "functions": [             {               "script_score": {                 "script": "ceil((log10(_score)+10) * 100) * 100000 + (99999 - doc['price'].value)",                 "lang": "expression"               }             }           ],           "boost_mode": "replace"         }       },       "score_mode": "max",       "inner_hits": {         "size": 4,         "sort": [           "_score",           {             "price": {               "order": "desc"             }           }         ]       }     }   },   "sort": [     "_score"   ] } 

boost_mode == 'replace means "use function result as score". Also, used log10 to be sure how many digits in _score. For sort by price DESC need to change formula to ceil((log10(_score)+10) * 100) * 100000 + (doc['price'].value)

UPD2

Formula ceil((log10(_score)+10) * 100) * 100000 + (99999 - doc['price'].value) returns 100099952 for price 48 and for price 50 (boost == 1, queryNorm == 1) because single precision limitation.

New formula ceil((log10(_score)+5) * 100) * 10000 + (9999 - ceil(log10(doc['price'].value) * 1000)) - reduced number of digits for score and switched from price to lg of price and reduced number of digits too. Feedback welcome.

like image 268
Dmitry MiksIr Avatar asked Sep 28 '16 13:09

Dmitry MiksIr


People also ask

How do I sort in Elasticsearch query?

Sort mode optioneditPick the highest value. Use the sum of all values as sort value. Only applicable for number based array fields. Use the average of all values as sort value.

How sort works in Elasticsearch?

Sort.by(relevance) Elasticsearch comes with a good default out of the box. It sorts the results by relevance to the search query term, most relevant first. Elasticsearch measures the relevance score as a floating-point number called _score, and orders results in the descending order of their _score values.

What is the default sort order in Elasticsearch?

In Elasticsearch, the relevance score is represented by the floating-point number returned in the search results as the _score, so the default sort order is _score descending.


1 Answers

Thanks for sharing, updated latest formula to ceil((log10(_score+1)+5) * 100) * 10000 + (9999 - ceil(log10(doc['price'].value +1) * 1000)) added +1 to score because in some cases it returns errors like this:

 "function score query returned an invalid score: -Infinity for doc: 4580" 

Update: Got another error:

 "function score query returned an invalid score: NaN for doc: 1739" 

changed formula to ceil((log10(_score+1)+5) * 100) * 10000 + (9999 - ceil(log10(doc['price'].value +1) * 1000)) added +1 to doc value to fix this

Update 2: Got another error:

 "function score query returned an invalid score: NaN for doc: 1739" 

changed formula to ceil((log10(_score+1)+5) * 100) * 10000 + (9999 - ceil(log10(doc['price'].value > 0 ? doc['price'].value : 1) * 1000)) replaced +1 with expression

Update 3: Got another error:

doesn't have an error message anymore it's hard to find now, but it was similar to the previous :(

changed formula to ceil(_score+1) + ceil((doc['price'].value > 0 ? doc['price'].value : 1) * 100) simplified formula, so I can understand and it still working today :)

like image 178
gatisr Avatar answered Oct 11 '22 13:10

gatisr