Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Best distributed filesystem for commodity linux storage farm [closed]

I have a lot of spare intel linux servers laying around (hundreds) and want to use them for a distributed file system in a web hosting and file sharing environment. This isn't for a HPC application, so high performance isn't critical. The main requirement is high availability, if one server goes offline, the data stored on it's hard drives is still available from other nodes. It must run over TCP/IP and provide standard POSIX file permissions.

I've looked at the following:

  • Lustre (http://wiki.lustre.org/index.php?title=Main_Page): Comes really close, but it doesn't provide redundancy for data on a node. You must make the data HA using RAID or DRBD. Supported by Sun and Open Source, so it should be around for a while

  • gfarm (http://datafarm.apgrid.org/): Looks like it provides the redundancy but at the cost of complexity and maintainability. Not as well supported as Lustre.

Does anyone have any experience with these or any other systems that might work?

like image 646
Eric Avatar asked Nov 06 '08 15:11

Eric


People also ask

What is the biggest challenges is making the distributed file system?

Since they are network based, all the complications of network programming kick in, thus making distributed filesystems more complex than regular disk filesystems. For example, one of the biggest challenges is making the filesystem tolerate node failure without suffering data loss.

Is Lustre open source?

Lustre® is an open-source file system that was developed in 1999 and released to general production in December 2003.

What is DFS in big data?

A Distributed File System (DFS) as the name suggests, is a file system that is distributed on multiple file servers or multiple locations. It allows programs to access or store isolated files as they do with the local ones, allowing programmers to access files from any network or computer.


2 Answers

check also GlusterFS

Edit (Aug-2012): Ceph is finally getting ready. Recently the authors formed Inktank, an independent company to sell commercial support for it. According to some presentaions, the mountable POSIX-compliant filesystem is the uppermost layer and not really tested yet, but the lower layers are being used in production for some time now.

The interesting part is the RADOS layer, which presents an object-based storage with both a 'native' access via the librados library (available for several languages) and an Amazon S3-compatible RESP API. Either one makes it more than adequate for adding massive storage to a web service.

This video is a good description of the philosophy, architecture, capabilities and current status.

like image 127
Javier Avatar answered Sep 18 '22 07:09

Javier


In my opinion, the best file system for Linux is MooseFS , it's quite new, but I had an opportunity to compare it with Ceph and Lustre and I say for sure that MooseFS is the best one.

like image 38
Adrian Goldberg Avatar answered Sep 19 '22 07:09

Adrian Goldberg