How does Hadoop's HDFS High Availability feature affects the CAP Theorem?

Question

According to everything I've read so far about the CAP Theorem, no distributed system can provide all three of: Availability, Consistency and Partition Tolerance.

Now, Hadoop 2.x introduced a new feature, which can be configured in order to remove the single point of failure that the hadoop clusters had (the single namenode). With that, the cluster becomes highly available, consistent and partition tolerant. Am I right? Or am I missing something? According to CAP, if the system tries to provide all three features, it should pay the price in latency, does the new feature add this latency to the cluster? Or has Hadoop cracked the CAP Theorem?

Andrey Sozykin · Accepted Answer

HDFS does not provide Availability in case of multiple correlated failures (for instance, three failed data nodes with the same HDFS block).

From CAP Confusion: Problems with partition tolerance

Systems such as ZooKeeper are explicitly sequentially consistent because there are few enough nodes in a cluster that the cost of writing to quorum is relatively small. The Hadoop Distributed File System (HDFS) also chooses consistency – three failed datanodes can render a file’s blocks unavailable if you are unlucky. Both systems are designed to work in real networks, however, where partitions and failures will occur, and when they do both systems will become unavailable, having made their choice between consistency and availability. That choice remains the unavoidable reality for distributed data stores.

How does Hadoop's HDFS High Availability feature affects the CAP Theorem?

Tags:

hadoop

availability

cap-theorem

Deleteman

1 Answers

Andrey Sozykin

Recent Activity

Donate For Us

How does Hadoop's HDFS High Availability feature affects the CAP Theorem?

Tags:

hadoop

availability

cap-theorem

Deleteman

1 Answers

Andrey Sozykin

Related questions

Recent Activity

Donate For Us