from what i've read, HDFS is fast because it relaxes some POSIX technics, but how does this work? or at least which ones? i've not found the answer because on google, i've found someone redirecting the asker to a large site!
According to the Hadoop - The Definitive Guide (suggest to get the book)
After creating a file, it is visible in the filesystem namespace, as expected:
A coherency model for a filesystem describes the data visibility of reads and writes for a file. HDFS trades off some POSIX requirements for performance, so some operations may behave differently than you expect them to.
However, any content written to the file is not guaranteed to be visible, even if the stream is flushed. So the file appears to have a length of zero:
Once more than a block’s worth of data has been written, the first block will be visible to new readers. This is true of subsequent blocks, too: it is always the current block being written that is not visible to other readers.
HDFS provides a method for forcing all buffers to be synchronized to the datanodes via the sync() method on FSDataOutputStream. After a successful return from sync(), HDFS guarantees that the data written up to that point in the file is persisted and visible to all new readers:
Another thing is
There are three types of permission: the read permission (r), the write permission ( w), and the execute permission (x). The read permission is required to read files or list the contents of a directory. The write permission is required to write a file, or for a directory, to create or delete files or directories in it. The execute permission is ignored for a file since you can’t execute a file on HDFS (unlike POSIX), and for a directory it is required to access its children.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With