I just started reading about Hadoop and came across the CAP Theorem. Can you please throw some light on which two components of CAP would be applicable to a HDFS system?
The CAP theorem states that it is not possible to guarantee all three of the desirable properties – consistency, availability, and partition tolerance at the same time in a distributed system with data replication.
The CAP theorem is a belief from theoretical computer science about distributed data stores that claims, in the event of a network failure on a distributed database, it is possible to provide either consistency or availability—but not both.
CAP theorem is problematic and it applies only to distributed database systems. When you have distributed databases then network partition and node crashes can happen. And when network partition happens you must have partition tolerance (the P of your CAP). So to answer your question number 1) It's either CP or AP.
CAP theorem is known as Brewer's theorem. According to the CAP theorem, there are limitations for the NoSQL database. Against three guarantees of a database, only two can be achieved — consistency, availability and partition tolerance. Answered by Kanak. CAP stands for Consistency, Availability and Partition tolerance.
The document very clearly says: "The consistency model of a Hadoop FileSystem is one-copy-update-semantics; that of a traditional local POSIX filesystem."
(One-copy update semantics means the file contents seen by all of the processes accessing or updating a given file would see as if only a single copy of the file existed.)
Moving forward, the document says:
The above mentioned characteristics point towards the presence of "Consistency" in the HDFS.
Source: https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-common/filesystem/introduction.html
HDFS provides High Availability for both Name Nodes and Data Nodes.
Source: https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithNFS.html
It is very clearly mentioned in the documentation(under the section "Operations and failures"):
"The time to complete an operation is undefined and may depend on the implementation and on the state of the system."
This indicates that the "Availability" in the context of CAP is missing in HDFS.
Source: https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-common/filesystem/introduction.html
Given the above mentioned arguments, I believe HDFS supports "Consistency and Partition Tolerance" and not "Availability" in the context of CAP theorem.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With