I would like to merge the rankings obtained from querying separate fields of an Elasticsearch index, so to obtain a "compound" ranking.
As a (silly) "matchmaking" example, suppose I wanted to retrieve best-matching results on an index of people containing their favorite music, food, sports.
The separate queries could be e.g.
"query": { "match" : { "music" : "indie classical metal" } }
which would yield me as ranked results:
"query": { "match" : { "foods" : "falafel strawberries coffee" } }
yielding
and
"query": { "match" : { "sports" : "basketball ski" } }
yielding
Now, I would like to obtained an "aggregate" ranking based on the rankings above, e.g. using the voting methods listed in How to merge a collection of ordered preferences.
So far, to achieve something along these lines I used syntax for compound queries such as
"query": {
"bool": {
"should": [
{ "match" : { "music" : "indie classical metal" } },
{ "match" : { "foods" : "falafel strawberries coffee" } },
{ "match" : { "sports" : "basketball ski" } },
]
}
}
or
"query": {
"dis_max": {
"queries": [
{ "match" : { "music" : "indie classical metal" } },
{ "match" : { "foods" : "falafel strawberries coffee" } },
{ "match" : { "sports" : "basketball ski" } },
]
}
}
but (AFAIK) these don't do what I am looking for (which is not using scores, but ranks). I understand that's fairly straightforward to post-process the rankings (e.g. using elasticsearch-py and then a few Python lines), but is it possible to do the things above directly with an Elasticsearch query?
(bonus question: could you suggest alternative strategies to merge rankings from multiple fields, beyond bool+should and dis_max that I could try out?)
(See the document score model and the first strategy in Answer #1)
The second strategy is pure scripting
Mapping
PUT /ranking_people_scripted
{
"mappings": {
"properties": {
"name": {
"type": "keyword"
},
"music": {
"type": "keyword"
},
"foods": {
"type": "keyword"
},
"sports": {
"type": "keyword"
}
}
}
}
Documents (see Answer #1)
Ranking scripted query
GET /ranking_people_scripted/_search?filter_path=hits.hits
{
"query": {
"script_score": {
"query": {
"match_all": {}
},
"script": {
"source": """
int calculateFieldScore(List fieldTerms, List queryTerms) {
def fieldScore = 0;
for (def queryTerm : queryTerms) {
if (fieldTerms.contains(queryTerm)) {
fieldScore++;
}
}
return fieldScore;
}
def documentScore = 0;
def termSets = params.term_sets;
for (def termSet : termSets) {
def queryTerms = termSet.terms;
def field = termSet.field;
def fieldBoost = termSet.boost;
def fieldTerms = doc[field];
int fieldScore = calculateFieldScore(fieldTerms, queryTerms);
documentScore += fieldScore * fieldBoost;
}
return documentScore;
""",
"params": {
"term_sets": [
{
"terms": [
"indie",
"classical"
],
"field": "music",
"boost": 1
},
{
"terms": [
"strawberries",
"coffee"
],
"field": "foods",
"boost": 1
},
{
"terms": [
"hockey",
"basketball"
],
"field": "sports",
"boost": 1
}
]
}
}
}
},
"fields": [
"name"
],
"_source": false
}
Response
{
"hits" : {
"hits" : [
{
"_index" : "ranking_people_scripted",
"_type" : "_doc",
"_id" : "1",
"_score" : 5.0,
"fields" : {
"name" : [
"Alice"
]
}
},
{
"_index" : "ranking_people_scripted",
"_type" : "_doc",
"_id" : "3",
"_score" : 4.0,
"fields" : {
"name" : [
"Charlie"
]
}
},
{
"_index" : "ranking_people_scripted",
"_type" : "_doc",
"_id" : "2",
"_score" : 3.0,
"fields" : {
"name" : [
"Bob"
]
}
}
]
}
}
You also could script a runtime field or a script query
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With