Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

what's the HDFS writing consistency

Tags:

hadoop

hdfs

Does HDFS have writing consistency just like Cassandra, let's say that when I finished writing one file into HDFS, when would I get successful response, is it when the first replication completed or 3 replications completed? (suppose the rep=3)

like image 570
Jack Avatar asked May 30 '16 20:05

Jack


2 Answers

It works differently in Hadoop compared to Cassandra.

You have two parameters related to replication.

dfs.replication : Default block replication. The actual number of replications can be specified when the file is created. The default is used if replication is not specified in create time

dfs.namenode.replication.min : Minimal block replication.

Once dfs.namenode.replication.min has been met, write operation will be treated as successful.

But this replication up to dfs.replication will happen in sequential pipeline. First Datanode writes the block and forward it to second Datanode. Second Datanode writes the block and forward it to third Datanode.

DFSOutputStream also maintains an internal queue of packets that are waiting to be acknowledged by datanodes, called the ack queue.

A packet is removed from the ack queue only when it has been acknowledged by all the Datanodes in the pipeline.

You can find more details in related SE question:

Hadoop 2.0 data write operation acknowledgement

like image 119
Ravindra babu Avatar answered Sep 18 '22 15:09

Ravindra babu


In a normal process, the client will wait until all the replicas sent the "acknowledge" for all the data packages. But if at the write process a DataNode fail, the client will continue writing the data to the remaining DataNodes and it will success if the number of DataNodes that acknowledge the reception of all the packages is equal or great that the minimum number of replicas (default 1). In this case since the number of replicas is less than the required number of replicas, the blocks will be marked as unreplicated and the NameNode will replicate them asynchronously.

Then, the write process could return a success even not all the required replicas were created.

If you want to force to only success if all replicas were created, you can set the property dfs.namenode.replication.min (default 1) equal than dfs.replication (default 3)

like image 35
RojoSam Avatar answered Sep 18 '22 15:09

RojoSam