I have an use case where I am looking to replicate a single database on multiple servers (for HA and scalability purposes),
Would there be any disadvantage to run a 3 node replica instead of a 3 nodes cluster ?
From my understanding a cluster is a set of servers or nodes. While a replica set is a set of servers or nodes all of which has replication mechanism built into each of them for downtime access and faster read operation.
Unlike scaling out with MySQL replication Cluster allows you to scale writes just as well as reads. New data nodes or MySQL servers can be added to an existing Cluster with no loss of service to the application.
The management and configuration is similar to server-to-server replication. You will configure these computers and storage in a cluster-to-cluster configuration, where one cluster replicates its own set of storage with another cluster and its set of storage.
Couchdb docs 11.2 provides an example cluster configuration of:
[cluster]
q=8
r=2
w=2
n=3
q - The number of shards.
r - The number of copies of a document with the same revision that have to be read before CouchDB returns with a 200 and the document. If there is only one copy of the document accessible, then that is returned with 200.
w - The number of nodes that need to save a document before a write is returned with 201. If the nodes saving the document is 0, 202 is returned.
n - The number of copies there is of every document. Replicas.
The behavior of your 3 part replica should be equivalent to:
[cluster]
q=1
r=1
w=1
n=3
when replicating correctly. This is a possible configuration of clustering, but not an optimal as it lacks:
the benefit of confirmation that multiple nodes and a majority of nodes have confirmed a save before it is acknowledged.
the benefit of confirmation that multiple nodes and a majority of nodes have confirmed a revision is correct before it is returned.
Expandability of the database beyond a single node's storage via sharding.
The ability to change to any configuration equivalent to cluster parameters with q, r or w > 1 without switching to a cluster.
Indirectly, the limits on acknowledgements make more potential conflicts to resolve between the replicas if the replicas are actually used for network scalability, and greater odds an actual inconsistency in the form of lost records if a node fails between acknowledging a save and passing it on to the other replicas.
Which version of CouchDB will you be using? If 2.0.0+, there's probably no reason not to use true clustering.
The only reason I can think of to use replicas instead of clustering would be for ease of configuration, or because your db (i.e. CouchDB < 2.0.0) doesn't support it.
But if you use clustering, even on just 3 nodes now, you're already set up for greater expansion later, just by adding more nodes.
Is there a reason you might not want to use a cluster?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With