Can someone highlight the technical details and when to use which.
In the current HDFS(0.23.3) implementation, hflush
and hsync
is the same. hsync invokes hflush
. hflush
guarantees that flushed data become visible to new readers. It is not guaranteed that data has been flushed to persistent store on the datanode. So using hflush
may lost some data if the datanode failures happen. hsync
is designed to guarantee that all data write to the disk device but is not implemented now.
In the alpha HDFS 2.0.*, hsync is implemented correctly.
You can get more details in HBase, HDFS and durable sync.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With