I am struggling to make boosting work the way I want it to in Elastic Search.
Let's say I have some profiles indexed containing gender, interests and age, and let's say that I find it most relevant that the gender matches, then the interest and the least important criterium is the user's age. I was expecting the below query to result in an ordering of the matching profiles according to the just mentioned principle, but when I execute it I get some males first and then I get the female Anna of the age 50 before the female Maria who likes cars... why doesn't Maria get a higher score than Anna??
{
"query": {
"bool" : {
"should" : [
{ "term" : { "gender" : { "term": "male", "boost": 10.0 } } },
{ "term" : { "likes" : { "term": "cars", "boost" : 5.0 } } },
{ "range" : { "age" : { "from" : 50, "boost" : 1.0 } } }
],
"minimum_number_should_match" : 1
}
}
}
Hints will be greatly appreciated,
Stine
These are the curl commands executed:
$ curl -XPUT http://localhost:9200/users/profile/1 -d '{
"nickname" : "bob",
"gender" : "male",
"age" : 48,
"likes" : "airplanes"
}'
$ curl -XPUT http://localhost:9200/users/profile/2 -d '{
"nickname" : "carlos",
"gender" : "male",
"age" : 24,
"likes" : "food"
}'
$ curl -XPUT http://localhost:9200/users/profile/3 -d '{
"nickname" : "julio",
"gender" : "male",
"age" : 18,
"likes" : "ladies"
}'
$ curl -XPUT http://localhost:9200/users/profile/4 -d '{
"nickname" : "maria",
"gender" : "female",
"age" : 25,
"likes" : "cars"
}'
$ curl -XPUT http://localhost:9200/users/profile/5 -d '{
"nickname" : "anna",
"gender" : "female",
"age" : 50,
"likes" : "clothes"
}'
$ curl -XGET http://localhost:9200/users/profile/_search -d '{
"query": {
"bool" : {
"should" : [
{ "term" : { "gender" : { "term": "male", "boost": 10.0 } } },
{ "term" : { "likes" : { "term": "cars", "boost" : 5.0 } } },
{ "range" : { "age" : { "from" : 50, "boost" : 1.0 } } }
],
"minimum_number_should_match" : 1
}
}
}'
Returns documents matching a positive query while reducing the relevance score of documents that also match a negative query. You can use the boosting query to demote certain documents without excluding them from the search results.
Range Queries in Elasticsearch Combining the greater than ( gt ) and less than ( lt ) range parameters is an effective way to search for documents that contain a certain field value within a range where you know the upper and lower bounds.
A search consists of one or more queries that are combined and sent to Elasticsearch. Documents that match a search's queries are returned in the hits, or search results, of the response.
The score represents how relevant a given document is for a specific query. The default scoring algorithm used by Elasticsearch is BM25.
Question 1: Slop is the number of words separating the span clauses. So slop 0 would mean they are adjacent.
The boost
value is not absolute - it is combined with other factors to determine the relevance of each term.
You have two "genders" (I would assume) but many different "likes". So male
is considered almost irrelevant, because it occurs so frequently within your data. However, cars
may only occur a few times, and thus is considered to be much more relevant.
This logic is useful for full text search, but not for enums, which are intended to be used essentially as filters.
Fortunately, you can disable this functionality on a per-field basis using omit_term_freq_and_positions
and omit_norms
.
Try setting your mapping as follows:
curl -XPUT 'http://127.0.0.1:9200/test/?pretty=1' -d '
{
"mappings" : {
"test" : {
"properties" : {
"likes" : {
"index" : "not_analyzed",
"omit_term_freq_and_positions" : 1,
"omit_norms" : 1,
"type" : "string"
},
"gender" : {
"index" : "not_analyzed",
"omit_term_freq_and_positions" : 1,
"omit_norms" : 1,
"type" : "string"
},
"age" : {
"type" : "integer"
}
}
}
}
}
'
UPDATE: Full working example:
Delete the existing index:
curl -XDELETE 'http://127.0.0.1:9200/users/?pretty=1'
Create the index with the new mapping:
curl -XPUT 'http://127.0.0.1:9200/users/?pretty=1' -d '
{
"mappings" : {
"profile" : {
"properties" : {
"likes" : {
"index" : "not_analyzed",
"omit_term_freq_and_positions" : 1,
"type" : "string",
"omit_norms" : 1
},
"age" : {
"type" : "integer"
},
"gender" : {
"index" : "not_analyzed",
"omit_term_freq_and_positions" : 1,
"type" : "string",
"omit_norms" : 1
}
}
}
}
}
'
Index the test docs:
curl -XPOST 'http://127.0.0.1:9200/users/profile/_bulk?pretty=1' -d '
{"index" : {"_id" : 1}}
{"nickname" : "bob", "likes" : "airplanes", "age" : 48, "gender" : "male"}
{"index" : {"_id" : 2}}
{"nickname" : "carlos", "likes" : "food", "age" : 24, "gender" : "male"}
{"index" : {"_id" : 3}}
{"nickname" : "julio", "likes" : "ladies", "age" : 18, "gender" : "male"}
{"index" : {"_id" : 4}}
{"nickname" : "maria", "likes" : "cars", "age" : 25, "gender" : "female"}
{"index" : {"_id" : 5}}
{"nickname" : "anna", "likes" : "clothes", "age" : 50, "gender" : "female"}
'
Refresh the index (to be sure that the latest docs are visible to search):
curl -XPOST 'http://127.0.0.1:9200/users/_refresh?pretty=1'
Search:
curl -XGET 'http://127.0.0.1:9200/users/profile/_search?pretty=1' -d '
{
"query" : {
"bool" : {
"minimum_number_should_match" : 1,
"should" : [
{
"term" : {
"gender" : {
"boost" : 10,
"term" : "male"
}
}
},
{
"term" : {
"likes" : {
"boost" : 5,
"term" : "cars"
}
}
},
{
"range" : {
"age" : {
"boost" : 1,
"from" : 50
}
}
}
]
}
}
}
'
Results:
# {
# "hits" : {
# "hits" : [
# {
# "_source" : {
# "nickname" : "bob",
# "likes" : "airplanes",
# "age" : 48,
# "gender" : "male"
# },
# "_score" : 0.053500723,
# "_index" : "users",
# "_id" : "1",
# "_type" : "profile"
# },
# {
# "_source" : {
# "nickname" : "carlos",
# "likes" : "food",
# "age" : 24,
# "gender" : "male"
# },
# "_score" : 0.053500723,
# "_index" : "users",
# "_id" : "2",
# "_type" : "profile"
# },
# {
# "_source" : {
# "nickname" : "julio",
# "likes" : "ladies",
# "age" : 18,
# "gender" : "male"
# },
# "_score" : 0.053500723,
# "_index" : "users",
# "_id" : "3",
# "_type" : "profile"
# },
# {
# "_source" : {
# "nickname" : "anna",
# "likes" : "clothes",
# "age" : 50,
# "gender" : "female"
# },
# "_score" : 0.029695695,
# "_index" : "users",
# "_id" : "5",
# "_type" : "profile"
# },
# {
# "_source" : {
# "nickname" : "maria",
# "likes" : "cars",
# "age" : 25,
# "gender" : "female"
# },
# "_score" : 0.015511602,
# "_index" : "users",
# "_id" : "4",
# "_type" : "profile"
# }
# ],
# "max_score" : 0.053500723,
# "total" : 5
# },
# "timed_out" : false,
# "_shards" : {
# "failed" : 0,
# "successful" : 5,
# "total" : 5
# },
# "took" : 4
# }
UPDATE: Alternative approach
Here, I present an alternative query which, while more verbose, gives you a much more predictable result. It involves using the custom filters score query. First, we filter the docs down to docs that match at least one of the conditions. Because we use the constant score query, all docs have an initial score of 1.
The custom filters score allows us to boost each doc if it matches a filter:
curl -XGET 'http://127.0.0.1:9200/_all/_search?pretty=1' -d '
{
"query" : {
"custom_filters_score" : {
"query" : {
"constant_score" : {
"filter" : {
"or" : [
{
"term" : {
"gender" : "male"
}
},
{
"term" : {
"likes" : "cars"
}
},
{
"range" : {
"age" : {
"gte" : 50
}
}
}
]
}
}
},
"score_mode" : "total",
"filters" : [
{
"boost" : "10",
"filter" : {
"term" : {
"gender" : "male"
}
}
},
{
"boost" : "5",
"filter" : {
"term" : {
"likes" : "cars"
}
}
},
{
"boost" : "1",
"filter" : {
"range" : {
"age" : {
"gte" : 50
}
}
}
}
]
}
}
}
'
You will see that the scores associated with each doc are nice round numbers, which are easily traced back to the matched clauses:
# [Fri Jun 8 21:30:24 2012] Response:
# {
# "hits" : {
# "hits" : [
# {
# "_source" : {
# "nickname" : "bob",
# "likes" : "airplanes",
# "age" : 48,
# "gender" : "male"
# },
# "_score" : 10,
# "_index" : "users",
# "_id" : "1",
# "_type" : "profile"
# },
# {
# "_source" : {
# "nickname" : "carlos",
# "likes" : "food",
# "age" : 24,
# "gender" : "male"
# },
# "_score" : 10,
# "_index" : "users",
# "_id" : "2",
# "_type" : "profile"
# },
# {
# "_source" : {
# "nickname" : "julio",
# "likes" : "ladies",
# "age" : 18,
# "gender" : "male"
# },
# "_score" : 10,
# "_index" : "users",
# "_id" : "3",
# "_type" : "profile"
# },
# {
# "_source" : {
# "nickname" : "maria",
# "likes" : "cars",
# "age" : 25,
# "gender" : "female"
# },
# "_score" : 5,
# "_index" : "users",
# "_id" : "4",
# "_type" : "profile"
# },
# {
# "_source" : {
# "nickname" : "anna",
# "likes" : "clothes",
# "age" : 50,
# "gender" : "female"
# },
# "_score" : 1,
# "_index" : "users",
# "_id" : "5",
# "_type" : "profile"
# }
# ],
# "max_score" : 10,
# "total" : 5
# },
# "timed_out" : false,
# "_shards" : {
# "failed" : 0,
# "successful" : 20,
# "total" : 20
# },
# "took" : 6
# }
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With