Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how HDFS relaxes POSIX

Tags:

posix

nosql

hdfs

from what i've read, HDFS is fast because it relaxes some POSIX technics, but how does this work? or at least which ones? i've not found the answer because on google, i've found someone redirecting the asker to a large site!

like image 957
Abdelouahab Pp Avatar asked Jul 05 '12 20:07

Abdelouahab Pp


1 Answers

According to the Hadoop - The Definitive Guide (suggest to get the book)

After creating a file, it is visible in the filesystem namespace, as expected:

A coherency model for a filesystem describes the data visibility of reads and writes for a file. HDFS trades off some POSIX requirements for performance, so some operations may behave differently than you expect them to.

However, any content written to the file is not guaranteed to be visible, even if the stream is flushed. So the file appears to have a length of zero:

Once more than a block’s worth of data has been written, the first block will be visible to new readers. This is true of subsequent blocks, too: it is always the current block being written that is not visible to other readers.

HDFS provides a method for forcing all buffers to be synchronized to the datanodes via the sync() method on FSDataOutputStream. After a successful return from sync(), HDFS guarantees that the data written up to that point in the file is persisted and visible to all new readers:

Another thing is

There are three types of permission: the read permission (r), the write permission ( w), and the execute permission (x). The read permission is required to read files or list the contents of a directory. The write permission is required to write a file, or for a directory, to create or delete files or directories in it. The execute permission is ignored for a file since you can’t execute a file on HDFS (unlike POSIX), and for a directory it is required to access its children.

like image 104
Praveen Sripati Avatar answered Nov 15 '22 08:11

Praveen Sripati