Couchbase node failure

Question

My understanding could be amiss here. As I understand it, Couchbase uses a smart client to automatically select which node to write to or read from in a cluster. What I DON'T understand is, when this data is written/read, is it also immediately written to all other nodes? If so, in the event of a node failure, how does Couchbase know to use a different node from the one that was 'marked as the master' for the current operation/key? Do you lose data in the event that one of your nodes fails?

This sentence from the Couchbase Server Manual gives me the impression that you do lose data (which would make Couchbase unsuitable for high availability requirements):

With fewer larger nodes, in case of a node failure the impact to the application will be greater

Thank you in advance for your time :)

m03geek · Accepted Answer

By default when data is written into couchbase client returns success just after that data is written to one node's memory. After that couchbase save it to disk and does replication.

If you want to ensure that data is persisted to disk in most client libs there is functions that allow you to do that. With help of those functions you can also enshure that data is replicated to another node. This function is called observe.

When one node goes down, it should be failovered. Couchbase server could do that automatically when Auto failover timeout is set in server settings. I.e. if you have 3 nodes cluster and stored data has 2 replicas and one node goes down, you'll not lose data. If the second node fails you'll also not lose all data - it will be available on last node.

If one node that was Master goes down and failover - other alive node becames Master. In your client you point to all servers in cluster, so if it unable to retreive data from one node, it tries to get it from another.

Also if you have 2 nodes in your disposal you can install 2 separate couchbase servers and configure XDCR (cross datacenter replication) and manually check servers availability with HA proxies or something else. In that way you'll get only one ip to connect (proxy's ip) which will automatically get data from alive server.

Tug Grall · Answer

Hopefully Couchbase is a good system for HA systems.

Let me explain in few sentence how it works, suppose you have a 5 nodes cluster. The applications, using the Client API/SDK, is always aware of the topology of the cluster (and any change in the topology).

When you set/get a document in the cluster the Client API uses the same algorithm than the server, to chose on which node it should be written. So the client select using a CRC32 hash the node, write on this node. Then asynchronously the cluster will copy 1 or more replicas to the other nodes (depending of your configuration).

Couchbase has only 1 active copy of a document at the time. So it is easy to be consistent. So the applications get and set from this active document.

In case of failure, the server has some work to do, once the failure is discovered (automatically or by a monitoring system), a "fail over" occurs. This means that the replicas are promoted as active and it is know possible to work like before. Usually you do a rebalance of the node to balance the cluster properly.

The sentence you are commenting is simply to say that the less number of node you have, the bigger will be the impact in case of failure/rebalance, since you will have to route the same number of request to a smaller number of nodes. Hopefully you do not lose data ;)

You can find some very detailed information about this way of working on Couchbase CTO blog: http://damienkatz.net/2013/05/dynamo_sure_works_hard.html

Note: I am working as developer evangelist at Couchbase

Couchbase node failure

Tags:

nodes

replication

couchbase

Tash Pemhiwa

2 Answers

m03geek

Tug Grall

Recent Activity

Donate For Us

Couchbase node failure

Tags:

nodes

replication

couchbase

Tash Pemhiwa

2 Answers

m03geek

Tug Grall

Related questions

Recent Activity

Donate For Us