Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

CAP with distributed System

Tags:

nosql

hadoop

When we talk about nosql distributed database system, we know that all of them fall under the 2 out of three of CAP theoram. For a distributed cluster where network failure and node failure are inevitable partition tolerance is a necessity hence leaving us to chose one from availability and consistency. So its basically CP or AP.

My questions are

  1. Under which category does hadoop fall into.

  2. Let's say I have a cluster with 6 nodes ABC and DEF, During a network failure let's say node A,B,C and node D,E,F are divided into two independent cluster.

    Now in a consistent and partition tolerant system (CP) model since an update in node A wont replicate to node D the consistency of the system wont allow user to update or read data till the network is up again running, Hence making the database down.

    Whereas an Available and partition tolerant system would allow the user of node D to see the old data when update is made at node A but doesn't guarantee the user of node D of the latest data. But after some time when the network is up running again it replicates the latest data of node A into node D and hence allows the user of node D to view the latest data.

    From the above two scenarios we can conclude that In an AP model there is no scope for database going hence allowing user to write and read even during failure and promises user latest data when the network is up again, So Why do people go for Consistent and partition tolerant model (CP). In my perspective during network failure (AP) has an advantage over (CP) allowing user to read and write data while the database under (CP) is down.

  3. Is there any system that can provide CAP together excluding the concept of Cassandra's eventually consistency.

  4. When does a user Choose availability over consistency and vice versa. Is there any database out there that allows user to switch its choice accordingly between CP and AP.

Thanks in advance :)

like image 836
Sam Avatar asked Nov 12 '13 07:11

Sam


People also ask

What is CAP in distributed system?

The CAP theorem states that a distributed system can only provide two of three properties simultaneously: consistency, availability, and partition tolerance. The theorem formalizes the tradeoff between consistency and availability when there's a partition.

What does the CAP theorem assert about a distributed system?

The CAP theorem is a fundamental part of the theory of distributed systems. It states that in the presence of partitions (i.e. network failures), a system cannot be both consistent and available, and must choose one of the two.

What are the 3 factors of CAP theorem?

The CAP theorem states that it is not possible to guarantee all three of the desirable properties – consistency, availability, and partition tolerance at the same time in a distributed system with data replication.

What is CAP theorem in cloud computing?

Simply put, the CAP theorem states that a given system design involves a tradeoff between the desirable properties of Consistency, Availability, and Partitionability. A given system cannot maximize all three of these properties simultaneously.


1 Answers

HDFS has a unique central decision point, the namenode. As such it can only fall in the CP side, since taking down the namenode takes down the entire HDFS system (no Availability). Hadoop does not try to hide this:

The NameNode is a Single Point of Failure for the HDFS Cluster. HDFS is not currently a High Availability system. When the NameNode goes down, the file system goes offline. There is an optional SecondaryNameNode that can be hosted on a separate machine. It only creates checkpoints of the namespace by merging the edits file into the fsimage file and does not provide any real redundancy.

Since the decission where to place data and where it can be read from is always handled by the namenode, which maintains a consistent view in memory, HDFS is always consistent (C). It is also partition tolerant in that it can handle loosing data nodes, subject to replication factor and data topology strategies.

Is there any system that can provide CAP together?

Yes, such systems are often mentioned in Marketing and other non-technical publications.

When does a user Choose availability over consistency and vice versa.

This is a business use case decision. When availability is more important they choose AP. When consistency is more important, they choose CP. In general when money changes hands the consistency takes precedence. Almost every other case favors availability.

Is there any database out there that allows user to switch its choice accordingly between CP and AP

Systems that allows you to modify both the write and the read quorums can be tuned to be either CP or AP, depending on the needs.

like image 102
Remus Rusanu Avatar answered Nov 03 '22 08:11

Remus Rusanu