My job is to design a distributed system for static image/video files. The size of the data is about tens of Terabytes. It's mostly for HTTP access (thus no processing on data; or only simple processing such as resizing- however it's not important because it can be done directly in the application).
To be a little more clear, it's a system that:
I am considering:
Native network file system: But it seems not feasible because the data can not fit into one machine.
Hadoop filesystem. I worked with Hadoop mapreduce before, but I have no experience using Hadoop as a static file repository for HTTP requests. So I don't know if it's possible or if it's a recommended way.
MogileFS. It seems promising, but I feel that using MySQL to manage local files (on a single machine) will create too much overhead.
Any suggestion please?
I am the author of Weed-FS. For your requirement, WeedFS is ideal. Hadoop can not handle many small files, in addition to your reasons, each file needs to have an entry in the master. If the number of files are big, the hdfs master node can not scale.
Weed-FS is getting faster when compiled with latest Golang releases.
Many new improvements have been done on Weed-FS recently. Now you can test and compare very easily with the built-in upload tool. This one upload all files recursively under a directory.
weed upload -dir=/some/directory
Now you can compare by "du -k /some/directory" to see the disk usage, and "ls -l /your/weed/volume/directory" to see the Weed-FS disk usage.
And I suppose you would need replication with data center, rack aware, etc. They are in now!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With