Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Differences between hflush & hsync api's in HDFS

Can someone highlight the technical details and when to use which.

like image 621
Inder Singh Avatar asked Apr 12 '12 10:04

Inder Singh


1 Answers

In the current HDFS(0.23.3) implementation, hflush and hsync is the same. hsync invokes hflush. hflush guarantees that flushed data become visible to new readers. It is not guaranteed that data has been flushed to persistent store on the datanode. So using hflush may lost some data if the datanode failures happen. hsync is designed to guarantee that all data write to the disk device but is not implemented now.

In the alpha HDFS 2.0.*, hsync is implemented correctly.

You can get more details in HBase, HDFS and durable sync.

like image 59
zsxwing Avatar answered Nov 09 '22 04:11

zsxwing