Reading Nathan Hurst's Visual Guide to NoSQL Systems, he includes the CAP
triangle:
C
onsistencyA
vailibilityP
artition ToleranceWith SQL Server being an AC
system, and MongoDB being a CP
system.
These definitions from come a UC Berkley professor Eric Brewer, and his talk at PODC 2000 (Principles of Distributed Computing):
Availability
Availability means just that - the service is available (to operate fully or not as above). When you buy the book you want to get a response, not some browser message about the web site being uncommunicative. Gilbert & Lynch in their proof of CAP Theorem make the good point that availability most often deserts you when you need it most - sites tend to go down at busy periods precisely because they are busy. A service that's available but not being accessed is of no benefit to anyone.
What does it mean, in the context of MongoDB, or BigTable, for the system to not be "available"?
Do you go to connect (e.g. over TCP/IP), and the server does not respond? Do you attempt execute a query, but the query never returns - or returns an error?
What does it mean to not be available?
High-availability NoSQL databases are systems designed to run without interruption of service. Many web-based businesses require data services that are available without interruption. For example, databases that support online purchasing need to be available 24 hours a day, 7 days a week, 365 days a year.
Bigtable is a NoSQL database that is designed to support large, scalable applications.
Cloud Bigtable is a managed NoSQL database, intended for analytics and operational workloads. It is an alternative to HBase, a columnar database system that runs on HDFS.
NoSQL databases store data in documents rather than relational tables. Accordingly, we classify them as "not only SQL" and subdivide them by a variety of flexible data models. Types of NoSQL databases include pure document databases, key-value stores, wide-column databases, and graph databases.
Availability in this case means that in the event of a network partition, the server that a client connects to may not be able to guarantee the level of consistency that the client expects (or that the system is configured to provide).
Assuming that you have 3 nodes, A, B, and C, in a hypothetical distributed system. A, B, and C are each running in their own rack of servers, with 2 switches between them:
[Node A] <- Switch #1 -> [Node B] <- Switch #2 -> [ Node C ]
Now assume that said system is set up so that it is GUARANTEED that any write will go to at least 2 nodes before it is considered committed. Now, lets assume that switch #2 gets unplugged, and some client is connected to node C:
[Node A] <- Switch #1 -> [Node B] [ Node C ] <-- Some client
That client will not be able to issue Consistent writes, because the distributed system is currently in a partitioned state (namely, Node C cannot contact enough other nodes to guarantee the 2-node consistency required).
I'd add to this that some NoSQL databases allow very dynamic selection of CAP attributes. Cassandra, for instance, allows clients to specify the number of servers that a write must go to before it is committed on a per-write basis. Writes going to a single server are "AP", writes going to a quorum (or all) servers are more "CA".
EDIT - from the comments below:
In MongoDB you can only have master/slave configuration within a replica set. What this means is that the choice of AP vs CP is made by the client at query time. The client can specify slaveOk, which will read from an arbitrarily selected slave (which may have stale data): mongodb.org/display/DOCS/…. If the client is not OK with stale data, don't specify slaveOk and the query will go to the master. If the client cannot reach the master, then you'll get an error. I'm not sure exactly what that error will be.
The CAP theorem applies to distributed computer systems. MongoDB supports two distinct forms of distributed computing: sharding for horizontal scaling and replica sets for failover/high availability. The two can be used together or independently. I think the CAP theorem applies slightly differently to the two forms:
Sharding level - MongoDB stores data on at most one authoritative shard.
Strong Partition-tolerance: Even if network partitioned, requests never return incorrect/stale data. Shards continue working independent of other shards.
Weak Availability: Reads/writes of data on a downed shard will fail.
Replica set level - MongoDB replicates data within a shard, ensuring consistency via a single, authoritative primary node.
Strong Partition-tolerance: If enough nodes become unreachable, a new primary is elected. The election process ensures there is always at most one primary node.
Weak Availability: Reads/writes will fail when no primary exists, even though the data could be accessed via secondary nodes.
The slaveOK/ReadPreference.SECONDARY option sacrifices some consistency (stale data can be read) for increased performance and availability.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With