Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Elastic search preference set to custom value, document still returned from different shards

I'm having issue with scoring: when I run the same query multiple times, each documents are not scored the same way. I found out that the problem is well known, it's the bouncing result issue.

A bit of context: I have multiple shards across multiple nodes (60 shards, 10 data nodes), all the nodes are using ES 2.3 and we're heavily using nested document - the example query doesn't use them, for simplicity.

I tried to resolve it by using the preference search parameter, with a custom value. The documentation states:

A custom value will be used to guarantee that the same shards will be used for the same custom value. This can help with "jumping values" when hitting different shards in different refresh states. A sample value can be something like the web session id, or the user name.

However, when I run this query multiple times:

GET myindex/_search?preference=asfd
{
  "query": {
    "term": {
      "has_account": {
        "value": "twitter"
      }
    }
  }
}

I end up having the same documents, but with different scoring/sorting. If I enable explain, I can see that those documents are coming from different shards. If I use preference=_primary or preference=_replica, we have the expected behavior (always the same shard, always the same scoring/sorting) but I can't query only one or the other...

I also experimented with search_type=dfs_search_then_fetch, which should generate the scoring based on the whole index, across all shards, but I still get different scoring for each run of the query.

So in short, how do I ensure the score and the sorting of the results of a query stay the same during a user's session?

like image 202
haltabush Avatar asked Sep 13 '25 15:09

haltabush


1 Answers

Looks like my replicas went out of sync with the primaries. No idea why, but deleting the replicas and recreating them have "fixed" the problem... I'll need some investigations on why it went out of sync

Edit 21/10/2016

Regarding the "preference" option not being taken into account, it's linked to the AWS zone awareness: if the preferred replica is in another zone than the client node, then the preference will be ignored.

The differences between the replicas are "normal" if you delete (or update) documents, from my understanding the deleted document count will vary between the replicas, since they're not necessarily merging segments at the same time.

like image 60
haltabush Avatar answered Sep 17 '25 21:09

haltabush



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!