Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Handling durability requirements failure in Couchbase

Recently I've started investigating Couchbase Server as a candidate for the project. A particular scenario I'm looking at right now is to how to make Couchbase acting as a "source of truth" which is why I'm digging into the durability aspect.

So here is a snippet from ACID Properties and Couchbase:

If the durability requirements fail, then Couchbase may still save the document and eventually distribute it across the cluster. All we know is that it didn’t succeed as far as the SDK knows. You can choose to act on this information to introduce more ACID properties into your application.

So imagine next. I insert/update a document and primary node fails until data made it to any replica. Let's say primary is gone for a long time. Now, I don't know at this point whether data was written to disk... So a scary part here is that "Couchbase may still save the document and eventually distribute it across the cluster". Meaning, as far as client can tell, the data didn't make it, so a user would see an error, but then all of a sudden it may appear in the system if primary goes back online.

Am I reading this statement correctly? If I am, what's the best practice to handle it with Couchbase?

like image 776
Kiryl Avatar asked Jan 28 '23 08:01

Kiryl


2 Answers

An update for this question:

Couchbase 6.5 introduced support for transactions:

transactions.run((txnctx) -> {
    // get the account documents for Andy and Beth  
    TransactionJsonDocument andy = txnctx.getOrError(collection, "Andy");
    JsonObject andyContent = andy.contentAsObject();
    int andyBalance = andyContent.getInt("account_balance");
    TransactionJsonDocument beth = txnctx.getOrError(collection, "Beth"); 
    JsonObject bethContent = beth.contentAsObject();
    int bethBalance = bethContent.getInt("account_balance");

    // if Beth has sufficient funds, make the transfer
    if (bethBalance > transferAmount) {
            andyContent.put("account_balance", andyBalance + transferAmount);
            txnctx.replace(andy, andyContent);
            bethContent.put("account_balance", bethBalance - transferAmount);
            txnctx.replace(beth, bethContent);
    }
    else throw new InsufficientFunds();  
    // Commit transaction - if omitted will be automatically committed 
    txnctx.commit();
});

The durability has also been improved, and now you can choose between 3 levels: majority, persistToActive, persistToMajority

Read more :

  • https://blog.couchbase.com/distributed-multi-document-acid-transactions-in-couchbase/
  • https://blog.couchbase.com/couchbase-transactions-java-api/
like image 188
deniswsrosa Avatar answered Feb 19 '23 20:02

deniswsrosa


Short answer:

Turn on auto-failover, and you'll be fine.

Longer answer:

It sounds like you're worried about a pretty narrow edge case here. Here's my understanding:

  1. You save a document with the SDK and give it a persists_to durability requirement.
  2. Couchbase acknowledges the document was saved to memory.
  3. The SDK starts to check to make sure it is persisted to disk and/or is replicated.
  4. During an extremely brief window of time, the node goes down. The document was persisted to disk but wasn't replicated to another node and the primary node isn't failed over.
  5. The SDK operation will return an error, saying that it didn't meet durability requirements. (I may be wrong about this, it may return a different error, which means you could act on it differently).
  6. You notify the user that something failed.
  7. The node comes back up, rejoins the cluster, and the document is there.
  8. Confused user?

If that's correct, the key is step 4. First of all, this seems like a pretty rare edge case. All three of those things have to be true to worry about this situation. My Couchbase internals knowledge isn't rock solid, so that situation may not be possible (but I'll keep going as if it were).

If you're running Couchbase on a good network and good machines, then network splits/nodes going down shouldn't happen very often. You can therefore turn on automatic failover. Remember that our document didn't make it to disk. So when a failover happens, the document was only in RAM, and therefore it's gone for good (and since you told the user that, no confusion).

Again, I am not an expert on Couchbase internals, so this is all to the best of my knowledge, but it sounds like all you need to do is turn on auto-failover, and you'll be fine. It's off by default; the idea is that you should understand what it is first and choose to opt-in. But for most systems, use auto-failover.

like image 39
Matthew Groves Avatar answered Feb 19 '23 18:02

Matthew Groves