In bioinformatics, we have been working more and more with cluster-based deployments like Kubernetes, Spark, and Hadoop. The term POSIX storage keeps coming up in documentation.
What is the difference between POSIX storage and NFS block storage (EBS)? Are the terms interchangeable? Does it basically mean anything that isn't object storage (S3) or Microsoft (SMB, CIFS)?
My understanding is:
POSIX storage refers to any storage that can be accessed using POSIX filesystem functions (ie. the usual 'fopen'), and that complies with POSIX filesystem requirements: this means that it must provide several facilities like POSIX attributes, or atomic file-blocking strictly following POSIX semantics.
This is normally storage that is attached to the host (either directly or via a SAN) through a POSIX operating system. In addition, the filesystem has to be POSIX-capable.
NFS, CIFS, other NAS filesystems, as well as HDFS (Hadoop) are not POSIX compatible. These work on top of network protocols, usually backed by some other filesystem, and their access semantics don't allow for POSIX compatibility (but see @SteveLoughran note about NFS).
NTFS and FAT are filesystems, but they are not POSIX capable (they don't support locking with the same semantics). Windows doesn't provide POSIX compatible functions either, but even Linux cannot be fully POSIX-storage-compatible on these filesystems. They are not "POSIX storage".
Amazon EBS volumes are block storage (SAN), so once a volume is attached to your host, if the filesystem you use is POSIX, and you are running a POSIX operating system, you can consider it "POSIX storage".
S3 is not a filesystem, it has its own object access API, and hence it cannot support POSIX file functions.
Most typical Linux filesystems (when mounted directly by a POSIX host) are POSIX capable (ie. ext3, ext4, xfs, zfs).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With