While I try to understand the "Availability" (A) and "Partition tolerance" (P) in CAP, I found it difficult to understand the explanations from various articles. I get a feeling that A and P can go together (I know this is not the case, and that's why I fail to understand!). Explaining in simple terms, what are A and P and the difference between them?

Consistency means that data is the same across the cluster, so you can read or write from/to any node and get the same data. Availability means the ability to access the cluster even if a node in the cluster goes down. Partition tolerance means that the cluster continues to function even if there is a "partition" (communication break) between two nodes (both nodes are up, but can't communicate). In order to get both availability and partition tolerance, you have to give up consistency. Consider if you have two nodes, X and Y, in a master-master setup. Now, there is a break between network communication between X and Y, so they can't sync updates. At this point you can either: A) Allow the nodes to get out of sync (giving up consistency), or B) Consider the cluster to be "down" (giving up availability) All the combinations available are: <ul> <li> CA - data is consistent between all nodes - as long as all nodes are online - and you can read/write from any node and be sure that the data is the same, but if you ever develop a partition between nodes, the data will be out of sync (and won't re-sync once the partition is resolved).</li> <li> CP - data is consistent between all nodes, and maintains partition tolerance (preventing data desync) by becoming unavailable when a node goes down.</li> <li> AP - nodes remain online even if they can't communicate with each other and will resync data once the partition is resolved, but you aren't guaranteed that all nodes will have the same data (either during or after the partition)</li> </ul> You should note that CA systems don't practically exist (even if some systems claim to be so).

Here is how I'm discussing CAP, regarding P particularly. CA is only possible if you are OK with a monolithic, single server database (maybe with replication but all data on one "failure block" - servers are not considered to partially fail). If your problem requires scale out, distributed, and multi-server --- network partitions can happen. You're already requiring P. Few problems I approach are amenable to single-server-always paradigms (or, as Stonebraker said, "distributed is table stakes"). If you can find a CA problem, solutions like a traditional non-scale-out RDBMS provides a lot of benefits. For me, rare: so we move on to discussing AP vs CP. You only choose between AP and CP operation when you have a partition. If the network & hardware is operating correctly, you get your cake and eat it too. Let's discuss the AP / CP distinction. AP - when there is a network partition, let the independent parts operate freely. CP - when there is a network partition, shut down nodes or disallow reads and writes so there are deterministic failures. I like architectures that can do both, because some problems are AP and some are CP - and some databases can do both. Among the CP and AP solutions, there are subtleties as well. For example, in an AP dataset, you have the possibility of both inconsistent reads, and generating write conflicts - these are two different possible AP modes. Can your system be configured for AP with high read availability but disallows write conflicts? Or can your AP system accept write conflicts, with a strong and flexible resolution system? Will you need both eventually, or can you pick a system that only does one? In a CP system, how much unavailability do you get with small partitions (single server), if any? Greater replication can increase unavailability in a CP system, how does the system handle those tradeoffs? These are all questions to ask with CP vs AP. A great read in this area right now is Brewer's "12 years later" post. I believe this moves forward the CAP debate with clarity, and recommend it highly. http://www.infoq.com/articles/cap-twelve-years-later-how-the-rules-have-changed

Simple way to understand CAP theorem: <blockquote> In case of network partition, one needs to choose between perfect availability and perfect consistency. Picking consistency means not being able to answer a client's query as the system cannot guarantee to return the most recent write. This sacrifices availability. Picking availability means being able to respond to a client's request but the system cannot guarantee consistency, i.e., the most recent value written. Available systems provide the best possible answer under the given circumstance. </blockquote> This explanation is from this excellent article. Hope it will help.

In simple CAP theorem states that its impossible for a distributed system to simultaneously provide all three guarantees: <img src="https://i.stack.imgur.com/M2bfY.png" alt="enter image description here"> Consistency Every node contains same data at the same time Availability At least one node must be available to serve data every time Partition tolerance Failure of the system is very rare Mostly every system can only guarantee minimum two features either CA, AP, or CP.

The CAP Theorem for distributed computing was published by Eric Brewer. This states that it is not possible for a distributed computer system to simultaneously provide all three of the following guarantees: <ul> <li> Consistency (all nodes see the same data even at the same time with concurrent updates ) </li> <li> Availability (a guarantee that every request receives a response about whether it was successful or failed) </li> <li> Partition tolerance (the system continues to operate despite arbitrary message loss or failure of part of the system) </li> </ul> The CAP acronym corresponds to these three guarantees. This theorem has created the base for modern distributed computing approaches. Worlds most high volume traffic companies (e.g. Amazon, Google, Facebook) use this as basis for deciding their application architecture. It's important to understand that only two of these three conditions can be guaranteed to be met by a system.

CAP theorem - Availability and Partition Tolerance

10 Answers

Consistency means that data is the same across the cluster, so you can read or write from/to any node and get the same data.

Availability means the ability to access the cluster even if a node in the cluster goes down.

Partition tolerance means that the cluster continues to function even if there is a "partition" (communication break) between two nodes (both nodes are up, but can't communicate).

In order to get both availability and partition tolerance, you have to give up consistency. Consider if you have two nodes, X and Y, in a master-master setup. Now, there is a break between network communication between X and Y, so they can't sync updates. At this point you can either:

A) Allow the nodes to get out of sync (giving up consistency), or

B) Consider the cluster to be "down" (giving up availability)

All the combinations available are:

CA - data is consistent between all nodes - as long as all nodes are online - and you can read/write from any node and be sure that the data is the same, but if you ever develop a partition between nodes, the data will be out of sync (and won't re-sync once the partition is resolved).
CP - data is consistent between all nodes, and maintains partition tolerance (preventing data desync) by becoming unavailable when a node goes down.
AP - nodes remain online even if they can't communicate with each other and will resync data once the partition is resolved, but you aren't guaranteed that all nodes will have the same data (either during or after the partition)

You should note that CA systems don't practically exist (even if some systems claim to be so).

198

answered Oct 02 '22 15:10

Chris Heald

Considering P in equal terms with C and A is a bit of a mistake, rather '2 out of 3' notion among C,A,P is misleading. The succinct way I would explain CAP theorem is, "In a distributed data store, at the time of network partition you have to chose either Consistency or Availability and cannot get both". Newer NoSQL systems are trying to focus on Availability while traditional ACID databases had a higher focus on Consistency.

You really cannot choose CA, network partition is not something anyone would like to have, it is just an undesirable reality of a distributed system, networks can fail. Question is what trade off do you pick for your application when that happens. This article from the man who first formulated that term seems to explain this very clearly.

answered Oct 02 '22 16:10

jayadev

Here is how I'm discussing CAP, regarding P particularly.

CA is only possible if you are OK with a monolithic, single server database (maybe with replication but all data on one "failure block" - servers are not considered to partially fail).

If your problem requires scale out, distributed, and multi-server --- network partitions can happen. You're already requiring P. Few problems I approach are amenable to single-server-always paradigms (or, as Stonebraker said, "distributed is table stakes"). If you can find a CA problem, solutions like a traditional non-scale-out RDBMS provides a lot of benefits.

For me, rare: so we move on to discussing AP vs CP.

You only choose between AP and CP operation when you have a partition. If the network & hardware is operating correctly, you get your cake and eat it too.

Let's discuss the AP / CP distinction.

AP - when there is a network partition, let the independent parts operate freely.

CP - when there is a network partition, shut down nodes or disallow reads and writes so there are deterministic failures.

I like architectures that can do both, because some problems are AP and some are CP - and some databases can do both. Among the CP and AP solutions, there are subtleties as well.

For example, in an AP dataset, you have the possibility of both inconsistent reads, and generating write conflicts - these are two different possible AP modes. Can your system be configured for AP with high read availability but disallows write conflicts? Or can your AP system accept write conflicts, with a strong and flexible resolution system? Will you need both eventually, or can you pick a system that only does one?

In a CP system, how much unavailability do you get with small partitions (single server), if any? Greater replication can increase unavailability in a CP system, how does the system handle those tradeoffs?

These are all questions to ask with CP vs AP.

A great read in this area right now is Brewer's "12 years later" post. I believe this moves forward the CAP debate with clarity, and recommend it highly.

http://www.infoq.com/articles/cap-twelve-years-later-how-the-rules-have-changed

answered Oct 02 '22 15:10

Brian Bulkowski

CAP Theorem

Consistency:

A read is guaranteed to return the most recent write(like ACID) for a given client. If any request comes during that time it has to wait till data sync is completed across/in the node(s).

Availability:

every node (if not failed) always executes queries and should always respond to requests. It does not matter whether it returns the latest copy or not.

Partition-tolerance:

The system will continue to function when network partitions occur.

Regarding AP, Availability(always accessible) can exist with(Cassendra) or without(RDBMS) partition tolerance

pic source

answered Oct 02 '22 16:10

mrsrinivas

I have gone through lot of links, but none of them could give me satisfactory answer, except one.

Hence I am describing CAP in very simple wordings.

Consistency: Must return same Data, regardless to from which node is it coming.
Availability: Node should respond (must be available).
Partition Tolerance: Cluster should respond (must be available), even if there is a a partition (i.e. network failure) between nodes. ( Also one main reason it confuses more is bad naming convention of it. If I had right, I might have given DNC theorem instead: Data Consistency, Node Availability, Cluster Availability, where each corresponds to Consistency, Availability and Partition Tolerance respectively )

CP database: A CP database delivers consistency and partition tolerance at the expense of availability. When a partition occurs between any two nodes, the system has to shut down the non-consistent node (i.e., make it unavailable) until the partition is resolved.

AP database: An AP database delivers availability and partition tolerance at the expense of consistency. When a partition occurs, all nodes remain available but those at the wrong end of a partition might return an older version of data than others. (When the partition is resolved, the AP databases typically resync the nodes to repair all inconsistencies in the system.)

CA database: A CA database delivers consistency and availability across all nodes. It can’t do this if there is a partition between any two nodes in the system, however, and therefore can’t deliver fault tolerance. In a distributed system, partitions can’t be avoided. So, while we can discuss a CA distributed database in theory, for all practical purposes, a CA distributed database can exist but should not exist.

Hence, this doesn’t mean you can’t have a CA database for your distributed application if you need one. Many relational databases, such as PostgreSQL, deliver consistency and availability and can be deployed to multiple nodes using replication.

Source: https://www.ibm.com/cloud/learn/cap-theorem

answered Oct 02 '22 16:10

Pratik K. Shah

I feel partition tolerance is not explained well in any of the answers so just to explain things in some more detail CAP theorem means:

C: (Linearizability or strong consistency) roughly means

If operation B started after operation A successfully completed, then operation B must see the system in the same state as it was on completion of operation A, or a newer state (but never old state).

“every request received by a non-failing [database] node in the system must result in a [non-error] response”. It’s not sufficient for some node to be able to handle the request: any non-failing node needs to be able to handle it. Many so-called “highly available” (i.e. low downtime) systems actually do not meet this definition of availability.

Partition Tolerance (terribly misnamed) basically means that you’re communicating over an asynchronous network that may delay or drop messages. The internet and all our data centres have this property, so you don’t really have any choice in this matter.

Source: Awesome Martin kleppmann's work

Just to take some example: Cassandra can at max be AP system. But if you configure it to read or write based on Quorum then it does not remain CAP-available (available as per definition of the CAP theorem) and is only P system.

answered Oct 02 '22 16:10

Anurag Sharma

Simple way to understand CAP theorem:

In case of network partition, one needs to choose between perfect availability and perfect consistency.

Picking consistency means not being able to answer a client's query as the system cannot guarantee to return the most recent write. This sacrifices availability.

Picking availability means being able to respond to a client's request but the system cannot guarantee consistency, i.e., the most recent value written. Available systems provide the best possible answer under the given circumstance.

This explanation is from this excellent article. Hope it will help.

answered Oct 02 '22 17:10

Mouna

In simple CAP theorem states that its impossible for a distributed system to simultaneously provide all three guarantees:

enter image description here

Consistency

Every node contains same data at the same time

Availability

At least one node must be available to serve data every time

Partition tolerance

Failure of the system is very rare

Mostly every system can only guarantee minimum two features either CA, AP, or CP.

answered Oct 02 '22 15:10

Jerry

Brewer's keynote, the Gilbert paper, and many other treatments, places C, A and P on an equal footing as desirable properties of an implementation and effectively say 'choose two!'. However, this is often considered to be a misleading presentation, since you cannot build - or choose! - 'partition tolerance': your system either might experience partitions or it won't.

CAP is better understood as describing the tradeoffs you have to make when you are building a system that may suffer partitions. In practice, this is every distributed system: there is no 100% reliable network. So (at least in the distributed context) there is no realistic CA system. You will potentially suffer partitions, therefore you must at some point compromise C or A.

https://github.com/henryr/cap-faq#10-why-do-some-people-get-annoyed-when-i-characterise-my-system-as-ca

answered Oct 02 '22 16:10

digit plumber

The CAP Theorem for distributed computing was published by Eric Brewer. This states that it is not possible for a distributed computer system to simultaneously provide all three of the following guarantees:

Consistency (all nodes see the same data even at the same time with concurrent updates )
Availability (a guarantee that every request receives a response about whether it was successful or failed)
Partition tolerance (the system continues to operate despite arbitrary message loss or failure of part of the system)

The CAP acronym corresponds to these three guarantees. This theorem has created the base for modern distributed computing approaches. Worlds most high volume traffic companies (e.g. Amazon, Google, Facebook) use this as basis for deciding their application architecture. It's important to understand that only two of these three conditions can be guaranteed to be met by a system.

answered Oct 02 '22 16:10

Amir Mokarchi

Related questions
                            
                                Best way to work with dates in Android SQLite [closed]
                            
                                Safely remove migration In Laravel
                            
                                How to compare dates in datetime fields in Postgresql?
                            
                                Use email address as primary key?
                            
                                Find rows that have the same value on a column in MySQL
                            
                                The ALTER TABLE statement conflicted with the FOREIGN KEY constraint
                            
                                MySQL dump by query
                            
                                First-time database design: am I overengineering? [closed]
                            
                                Warning: mysql_connect(): [2002] No such file or directory (trying to connect via unix:///tmp/mysql.sock) in
                            
                                Create a new database with MySQL Workbench
                            
                                Django Model - Case-insensitive Query / Filtering
                            
                                Count the occurrences of DISTINCT values
                            
                                Fastest way to count exact number of rows in a very large table?
                            
                                Advantages and disadvantages of GUID / UUID database keys
                            
                                How to cache data in a MVC application
                            
                                How to randomly select rows in SQL?
                            
                                Backup/Restore a dockerized PostgreSQL database
                            
                                Storing JSON in database vs. having a new column for each key
                            
                                Is the primary key automatically indexed in MySQL?
                            
                                Need to list all triggers in SQL Server database with table name and table's schema

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

CAP theorem - Availability and Partition Tolerance

Tags:

database

nosql

consistency

availability

Manikandan Kannan

People also ask

10 Answers

Chris Heald

jayadev

Brian Bulkowski

Consistency:

Availability:

Partition-tolerance:

mrsrinivas

Pratik K. Shah

Anurag Sharma

Mouna

Jerry

digit plumber

Amir Mokarchi

Recent Activity

Donate For Us