Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Lustre, Gluster or MogileFS?? for video storage, encoding and streaming [closed]

So many options and so little time to test them all... I wonder if someone has experiences with distributed file systems for video streaming and storage/encoding.

I have a lot of huge video files (50GB to 250GB) that I need to store somewhere, be able to encode them to mp4 and stream them from several Adobe FMS servers. The only way to handle all this is with a distributed file system but now the question is which one??

My research so far tells me:

  • Lustre: mature proven solution, used by a lot of big companies, best with >10G files is a kernel driver.
  • Gluster: new, less mature, FUSE based that means easy to install but maybe slower due to FUSE overhead. Better to handle a large number of smaller files ~1GB
  • MogileFS: seems to be only for small files ~MB, uses HTTP for access?? possible FUSE binding in the future.

So far Lustre seems the winner but I would like to hear real experiences for the particular application I have.

Also Hadoop, Redhat GFS, Coda and Windows DFS sound as options so any experiences are welcome. If someone has benchmarks please share.

After some real experience this is what I have learned:

  • Luster:
    • Performance: Amazingly fast! I can assert that Lustre can serve a lot of streams and that encoding speed is not affected by accessing files via Lustre.
    • POXIS compatibility: Very good!. No need to modify applications to use luster.
    • Replication, Load Balancing and Fail Over: Very bad!. For replication load balancing we and fail over we need to rely on other software such as virtual IPs and DRDB.
    • Installation: The worst!. Impossible to install by mere mortals. Requires a very specific combination of kernel, lustre patches and tweaks to get it working. And current luster patches usually work with old kernels that are incompatible with new hardware/software.
  • MogileFS:
    • Performance: Good for small files but not usable for medium to large files. This is mostly due to HTTP overhead since all files are send/receive via HTTP requests that encode all data in base64 adding a 33% overhead to each file.
    • POXIX compatibility is non existent. All applications require to be modified to use mogilefs that renders it useless for streaming/encoding since most streaming servers and encoding tools do not understand MogileFS protocol.
    • Replication and failover out of the box and load balancing can be implemented in the application by accessing more than one tracker at a time.
    • Installation is relatively easy and ready to use packages exist in most distributions. The only difficulty I found was setting the database master-slave to eliminate the single point of failure.
      • Gluster:
    • Performance: Very bad for streaming. I cannot reach more than a few Mbps in a 10Gbps network. Clients and Server CPU skyrockets on heavy writes. For encoding works because the CPU is saturated before the network and I/O.
    • POXIS: Almost compatible. The tools I use can access gluster mounts as normal folders in disk but in some edge cases things start causing problems. Check gluster mailing lists and you will see there are a lot of problems.
    • Replication, Failover and Load balancing: The best! if they actually worked. Gluster is very new and it has a lot of bugs and performance problems.
    • Installation is too easy. The management command line is amazing and setting replicated, striped and distributed volumes among several servers can not be any easier.

Final conclusion:

Unfortunately the conclusion is "No single silver bullet".

Currently we have our media files in Gluster3.2 in a replicated volume for storage and transcoding. As long as you don't have a lot of servers, avoid geo-replication and stripe volumes things work ok.

When we are going to stream the media files we copy them to a lustre volume that is replicated to a second lustre volume via DR:DB. The wowza server then read the media files from the lustre volumes.

And finally we use MogileFS to serve the thumbnails in our web application servers.

like image 792
Horacio Avatar asked May 27 '09 17:05

Horacio


People also ask

What is gluster used for?

GlusterFS (Gluster File System) is an open source Distributed File System that can scale out in building-block fashion to store multiple petabytes of data.

How does gluster FS work?

GlusterFS combines the unused storage space on multiple servers to create a single, large, virtual drive that you can mount like a legacy filesystem – using NFS or FUSE on a client PC. And, you can add more servers or remove existing servers from the storage pool on the fly.


1 Answers

GlusterFS improved themselves a lot up to this date. They are now providing "granular locking" for large files. See here: http://www.gluster.org/community/documentation/index.php/WhatsNew3.3 Also it is quite dependent video frame rates you should work for too. If you will not go up to 4K rates, Gluster can solve the storage problems. If there is a huge demand for speed, therefore Infiniband can come in to play.

like image 72
Kunthar Avatar answered Oct 04 '22 12:10

Kunthar