After reading a lot of documentation, I understood that the primary_term along with the sequence_number are used for optimistic concurrency control in order to prevent an older version of a document overriding a newer one. However, my question is what is the primary_term exactly? Is it the same as the primary shard?
From the docs:
To ensure an older version of a document doesn’t overwrite a newer version, every operation performed to a document is assigned a sequence number by the primary shard that coordinates that change.
Let's say your index is made up of 5 primary shards (that was the default prior to version 7). Indexing and Update-Requests are performed against primary shards. If you have multiple primary shards, elasticsearch is able to parallelize/distribute incoming requests (e.g. huge bulk-requests) to multiple shards in order to enhance performance.
So the primary_term gives information about the primary shard (#1, #2, #3, #4 or #5 in this example) that executed/coordinated the change/update.
Q: Is it the same as the primary shard?
A: If you mean, is it the same as the number of primary shards, then yes, in case your index has only one primary shard, the values will be equal.
The primary term increments every time a different shard becomes primary during failover. This helps when resolving changes which occurred on old primaries which come back online vs. changes which occur on the new primary (the new wins).
These primary terms are incremental and change when a primary is promoted. They're persisted in the cluster state, thus representing a sort of “version” or “generation” of primaries that the cluster is on.
https://www.elastic.co/blog/elasticsearch-sequence-ids-6-0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With