Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Are there any distributed high-availability filesystems (for Linux) that are actively-developed?

Are there any distributed, high-availability filesystems (for Linux) that are actively-developed?

Let me be more specific:

  • Distributed means it deals gracefully with client-to-server latencies like you'd find over the public worldwide internet (300ms and up being commonplace) and occasional connectivity flakiness. This means really good client-side caching (i.e. with callbacks) is required. NFS does not do this. It also means encryption of on-the-wire data without needing an IPSEC VPN.

  • High availability means that data can be stored on multiple servers and the client is smart enough to try another server if it encounters problems. Putting that intelligence in the client is really important, and it's why this sort of thing can't just be grafted onto NFS. At a minimum this needs to be possible for read-only data. It would be nice for read-write data but I know that's hard.

  • Filesystem means a kernel driver exporting a POSIX interface and permissions and access control are enforced in the face of untrustworthy clients. SAN systems often assume the clients are trustworthy.

I'm an OpenAFS refugee. I love it but at this point I can no longer accept its requirement that all the file servers effectively "have root" on all other file servers. The proprietary disk format and overhead of having to run Kerberos infrastructure (which I wouldn't otherwise need) are also becoming increasingly problematic.

Are there any systems other than OpenAFS with these properties? Intermezzo and Coda probably qualify but aren't active projects any longer. Lustre is cool but seems to be designed for ultra-low-latency data centres. Ceph is awesome but not really a filesystem, more of a thing that runs under a filesystem (yes, there's CephFS, but it's really a showcase for Ceph and explicitly not production-ready and there's no timetable for that). Tahoe-LAFS is cool but it and GoogleFS aren't really filesystems in that they don't export a POSIX interface through a kernel module. My understanding of GFS (Global Filesystem) is that the clients can manipulate the on-disk data structures directly, so they're implicitly root-level trusted (and this is part of why it's fast) -- correct me if I'm wrong here.

Needs to be open source since I can't afford to have my data locked up in something proprietary. I don't mind paying for software, but I can't be held hostage in this situation.

Thanks,

like image 929
Adam Avatar asked Nov 01 '22 01:11

Adam


1 Answers

First of all you can use local file system (mounted with -o user_xattr) to cache NFS (mounted with -o fsc) using cachefilesd (provided by cachefilesd package on Debian) through fscache facility.

Although file system that you are looking for probably do not exist, IMHO two projects came pretty close with fairly good FUSE client implementations:

  • LizardFS (GPL-3 licensed, hosted at Github), fork of now proprietary MooseFS.

  • Gfarm file system (BSD/Apache-2.0, hosted at SourceForge)

After evaluating Ceph for quite a while I came to conclusion that it is flawed (with no hope for improvement in the foreseeable future) and not suitable for serious use. XtreemFS is a disappointment too. I hope that upcoming OrangeFS version 3 (with promised data integrity checks) might not be too bad but that's remains to be seen...

like image 197
Onlyjob Avatar answered Nov 11 '22 14:11

Onlyjob