According to everything I've read so far about the CAP Theorem, no distributed system can provide all three of: Availability, Consistency and Partition Tolerance.
Now, Hadoop 2.x introduced a new feature, which can be configured in order to remove the single point of failure that the hadoop clusters had (the single namenode). With that, the cluster becomes highly available, consistent and partition tolerant. Am I right? Or am I missing something? According to CAP, if the system tries to provide all three features, it should pay the price in latency, does the new feature add this latency to the cluster? Or has Hadoop cracked the CAP Theorem?
HDFS does not provide Availability in case of multiple correlated failures (for instance, three failed data nodes with the same HDFS block).
From CAP Confusion: Problems with partition tolerance
Systems such as ZooKeeper are explicitly sequentially consistent because there are few enough nodes in a cluster that the cost of writing to quorum is relatively small. The Hadoop Distributed File System (HDFS) also chooses consistency – three failed datanodes can render a file’s blocks unavailable if you are unlucky. Both systems are designed to work in real networks, however, where partitions and failures will occur, and when they do both systems will become unavailable, having made their choice between consistency and availability. That choice remains the unavoidable reality for distributed data stores.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With