Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How Marklogic can have consistency and availability?

The CAP theorem seems logical to me. I understand that:

If I have consistency on a distributed system, I have to wait for all transactions. The cost of ACID is the time to duplicate data on all the network.

But how Marklogic can have both. ACID and distributed system without lag?
So is it possible to have BASE and ACID properties on the same database?
So is CAP theorem wrong?

like image 325
jeremieca Avatar asked Aug 07 '15 11:08

jeremieca


2 Answers

Availability in CAP Theorem is about the hosts that are on either side of the partition, not about the system as a whole.

In CAP Theorem you are "Available" if all hosts on either side of a network partition can continue to accept both read and update transactions. Most of our customers don't care if all hosts remain available in the face of a network partition. They care that the database as a whole remain available during a network partition. So if the cluster has replicated or shared data so that there is enough data on both sides of the partition to continue to serve queries, and is smart enough to know which side of the partition should remain available and which should gracefully bow out, then the database can remain available in the face of a network partition, even if all hosts do not. That's what MarkLogic does within a cluster.

Between clusters, MarkLogic has many options for how close to absolutely consistent you want to be. We use asynchronous replication to move data between clusters, so there if there is a network partition between clusters, the data may not be consistent between those clusters. You can control how long that lag limit is so that you can tune this, and if you need absolute consistency between clusters, we have ways of achieving that as well.

Bottom line is that:

  • Customers care mostly that their database or data services remain available, not that any specific host remain available, so we focus on availability of the system and can provide that without violating CAP Theorem.
  • Multi-cluster MarkLogic deployments can be tuned to give you the right balance of consistency and availability in the face of a network partition.

Hope that helps.

like image 65
David Gorbet Avatar answered Sep 30 '22 06:09

David Gorbet


The CAP theorem is not wrong, it's just out-dated. Here's the update from the author: CAP Twelve Years Later: How the "Rules" Have Changed.

MarkLogic supports ACID properties via MVCC. If you like, you could configure it to behave with BASE properties instead. The key, as I understand it, is to design and optimize for your production requirements. MarkLogic has a host of replication features available and we're constantly adding to that portfolio as our customers solve real-world problems deploying globally-distributed clusters.

Have you read Inside MarkLogic Server? That white-paper does a great job explaining how MarkLogic solves many of these challenges.

like image 44
Sam Mefford Avatar answered Sep 30 '22 04:09

Sam Mefford