Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does Hadoop's HDFS High Availability feature affects the CAP Theorem?

According to everything I've read so far about the CAP Theorem, no distributed system can provide all three of: Availability, Consistency and Partition Tolerance.

Now, Hadoop 2.x introduced a new feature, which can be configured in order to remove the single point of failure that the hadoop clusters had (the single namenode). With that, the cluster becomes highly available, consistent and partition tolerant. Am I right? Or am I missing something? According to CAP, if the system tries to provide all three features, it should pay the price in latency, does the new feature add this latency to the cluster? Or has Hadoop cracked the CAP Theorem?

like image 730
Deleteman Avatar asked Mar 17 '23 19:03

Deleteman


1 Answers

HDFS does not provide Availability in case of multiple correlated failures (for instance, three failed data nodes with the same HDFS block).

From CAP Confusion: Problems with partition tolerance

Systems such as ZooKeeper are explicitly sequentially consistent because there are few enough nodes in a cluster that the cost of writing to quorum is relatively small. The Hadoop Distributed File System (HDFS) also chooses consistency – three failed datanodes can render a file’s blocks unavailable if you are unlucky. Both systems are designed to work in real networks, however, where partitions and failures will occur, and when they do both systems will become unavailable, having made their choice between consistency and availability. That choice remains the unavoidable reality for distributed data stores.

like image 197
Andrey Sozykin Avatar answered Apr 06 '23 04:04

Andrey Sozykin