I'm having issue with scoring: when I run the same query multiple times, each documents are not scored the same way. I found out that the problem is well known, it's the bouncing result issue.
A bit of context: I have multiple shards across multiple nodes (60 shards, 10 data nodes), all the nodes are using ES 2.3 and we're heavily using nested document - the example query doesn't use them, for simplicity.
I tried to resolve it by using the preference
search parameter, with a custom value. The documentation states:
A custom value will be used to guarantee that the same shards will be used for the same custom value. This can help with "jumping values" when hitting different shards in different refresh states. A sample value can be something like the web session id, or the user name.
However, when I run this query multiple times:
GET myindex/_search?preference=asfd
{
"query": {
"term": {
"has_account": {
"value": "twitter"
}
}
}
}
I end up having the same documents, but with different scoring/sorting. If I enable explain
, I can see that those documents are coming from different shards.
If I use preference=_primary
or preference=_replica
, we have the expected behavior (always the same shard, always the same scoring/sorting) but I can't query only one or the other...
I also experimented with search_type=dfs_search_then_fetch
, which should generate the scoring based on the whole index, across all shards, but I still get different scoring for each run of the query.
So in short, how do I ensure the score and the sorting of the results of a query stay the same during a user's session?
Looks like my replicas went out of sync with the primaries. No idea why, but deleting the replicas and recreating them have "fixed" the problem... I'll need some investigations on why it went out of sync
Edit 21/10/2016
Regarding the "preference" option not being taken into account, it's linked to the AWS zone awareness: if the preferred replica is in another zone than the client node, then the preference will be ignored.
The differences between the replicas are "normal" if you delete (or update) documents, from my understanding the deleted document count will vary between the replicas, since they're not necessarily merging segments at the same time.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With