Elasticsearch doesn't have "read consistency" param (like Cassandra). But it has "write consistency" and "read preference". Documentation says the following about Write Consistency <blockquote> Write Consistency To prevent writes from taking place on the "wrong" side of a network partition, by default, index operations only succeed if a quorum (>replicas/2+1) of active shards are available. This default can be overridden on a node-by-node basis using the action.write_consistency setting. To alter this behavior per-operation, the consistency request parameter can be used. Valid write consistency values are one, quorum, and all. Note, for the case where the number of replicas is 1 (total of 2 copies of the data), then the default behavior is to succeed if 1 copy (the primary) can perform the write. The index operation only returns after all active shards within the replication group have indexed the document (sync replication). </blockquote> My question is about the last paragraph: <blockquote> The index operation only returns after all active shards within the replication group have indexed the document (sync replication). </blockquote> If <code>write_consistency=quorum</code> (default) and all shards are live (no node failures, no network-partition), then: 1) Does index operation return as soon as quorum of shards have finished indexing? (even though all shards are live/active) 2) Or does index operation return when all live/active shards have finished indexing? (i.e. quorum is considered only in case of failures/timeouts) In the first case - read may be eventual-consistent (may get stale data), write is quicker. In the second case - read is consistent (as long as there are no network-partitions), write is slower (as it waits for the slower shard/node). Does anyone know how it works? Another thing that I wonder about - is why the default value for 'preference' param (in get/search request) is <code>randomized</code> but not <code>_local</code> (which must have been more efficient I suppose)

I think I can answer my own question now :) Regarding the first question, by re-re-reading the documentation (this and this) a few times :) I realized that this statement should be right: <blockquote> Index operation return when all live/active shards have finished indexing, regardless of consistency param. Consistency param may only prevent the operation to start if there are not enough available shards(nodes). </blockquote> So for example, if there are 3 shards (one primary and two replicas), and all shards are available - the operation will be waiting for all 3 (considering that all 3 are live/available), regardless of consistency param (even when <code>consistency=one</code>) This makes the system consistent (at least the document-api part); unless there is a network-partition. But, I didn't have a chance to test this yet. UPDATE: by consistency here, I don't mean ACID-consistency, it is just the guarantee that all replicas are updated at the moment when request is returned. Regarding the second question: The obvious answer is - it is <code>randomized</code> to spread the load; on the other hand, a client can pick a random node to talk to, but probably it is not 100% efficient as a single request may need multiple shards.

Elasticsearch read and write consistency

Tags:

elasticsearch

eventual-consistency

Elasticsearch doesn't have "read consistency" param (like Cassandra). But it has "write consistency" and "read preference".

Documentation says the following about Write Consistency

Write Consistency
To prevent writes from taking place on the "wrong" side of a network partition, by default, index operations only succeed if a quorum (>replicas/2+1) of active shards are available. This default can be overridden on a node-by-node basis using the action.write_consistency setting. To alter this behavior per-operation, the consistency request parameter can be used.

Valid write consistency values are one, quorum, and all.

Note, for the case where the number of replicas is 1 (total of 2 copies of the data), then the default behavior is to succeed if 1 copy (the primary) can perform the write.

The index operation only returns after all active shards within the replication group have indexed the document (sync replication).

My question is about the last paragraph:

The index operation only returns after all active shards within the replication group have indexed the document (sync replication).

If write_consistency=quorum (default) and all shards are live (no node failures, no network-partition), then:
1) Does index operation return as soon as quorum of shards have finished indexing? (even though all shards are live/active)
2) Or does index operation return when all live/active shards have finished indexing? (i.e. quorum is considered only in case of failures/timeouts)

In the first case - read may be eventual-consistent (may get stale data), write is quicker.
In the second case - read is consistent (as long as there are no network-partitions), write is slower (as it waits for the slower shard/node).

Does anyone know how it works?

Another thing that I wonder about - is why the default value for 'preference' param (in get/search request) is randomized but not _local (which must have been more efficient I suppose)

209

asked Jul 16 '16 18:07

Vladimir Sorokin

1 Answers

I think I can answer my own question now :)

Regarding the first question, by re-re-reading the documentation (this and this) a few times :) I realized that this statement should be right:

Index operation return when all live/active shards have finished indexing, regardless of consistency param. Consistency param may only prevent the operation to start if there are not enough available shards(nodes).

So for example, if there are 3 shards (one primary and two replicas), and all shards are available - the operation will be waiting for all 3 (considering that all 3 are live/available), regardless of consistency param (even when consistency=one)
This makes the system consistent (at least the document-api part); unless there is a network-partition. But, I didn't have a chance to test this yet.

UPDATE: by consistency here, I don't mean ACID-consistency, it is just the guarantee that all replicas are updated at the moment when request is returned.

Regarding the second question: The obvious answer is - it is randomized to spread the load; on the other hand, a client can pick a random node to talk to, but probably it is not 100% efficient as a single request may need multiple shards.

answered Sep 22 '22 22:09

Vladimir Sorokin

Related questions
                            
                                Representing a Kibana query in a REST, curl form
                            
                                Install elasticsearch 1.1 using brew
                            
                                ElasticSearch: How to search for a value in any field, across all types, in one or more indices?
                            
                                ElasticSearch date range
                            
                                Elasticsearch Scroll
                            
                                Store Date Format in elasticsearch
                            
                                How to secure an Internet-facing Elastic Search implementation in a shared hosting environment? [closed]
                            
                                Define custom ElasticSearch Analyzer using Java API
                            
                                Exact match in elastic search query
                            
                                how edge ngram token filter differs from ngram token filter?
                            
                                Cannot construct instance of `java.time.LocalDate` - Spring boot, elasticseach, jackson
                            
                                Elasticsearch - generic facets structure - calculating aggregations combined with filters
                            
                                Can't create two Types to same index elasticsearch & Kibana
                            
                                Filtered Query in Elasticsearch Java API
                            
                                Full text search options for MongoDB setup
                            
                                Can I create a document with the update API if the document doesn't exist yet
                            
                                Logstash date parsing as timestamp using the date filter
                            
                                Nested type in Elasticsearch: "object mapping can't be changed from nested to non-nested" when indexing a document
                            
                                How to really reindex data in elasticsearch
                            
                                nested vs object in Elasticsearch

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Elasticsearch read and write consistency

Tags:

elasticsearch

eventual-consistency

Vladimir Sorokin

People also ask

1 Answers

Vladimir Sorokin

Recent Activity

Donate For Us