Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Distributed caching for large objects

I want to share a very large object e.g. in orders of megabytes or even several gigabytes, between a set of machines. The object will be written once but may be read many times. Maybe a naive approach is to use a ceneteralized storage like redis. However, it may become a single point of failure and too many requests may make a DOS attack on redis. Then, a distributed solution is much more promising. But, the main concern is replicating the structure to all machines. If the replication is done via a master/slave technique, then replication may result a huge traffic load on the master because the object is large. Therefore, a better solution is using a P2P strategy for replicating the object in order to decrease the network load on the master.

Does any body know a solution for this problem? Maybe some candidates are:
- Redis
- Memcached
- Voldemort
- Hazelcast

My major concerns are Java interface, sharing big object, high availablity, and low network traffic for replication.

Thanks beforehand.

like image 967
Saeed Shahrivari Avatar asked Jan 28 '13 16:01

Saeed Shahrivari


1 Answers

Caching large objects in NoSQL stores is generally not a good idea, because it is expensive in term of memory and network bandwidth. I don't think NoSQL solutions shine when it comes to storing large objects. Redis, memcached, and most other key/value stores are clearly not designed for this.

If you want to store large objects in NoSQL products, you need to cut them in small pieces, and store the pieces as independent objects. This is the approach retained by 10gen for gridfs (which is part of the standard MongoDB distribution):

See http://docs.mongodb.org/manual/applications/gridfs/

To store large objects, I would rather look at distributed filesystems such as:

  • Ceph
  • GlusterFS
  • MapR-FS

These systems are scalable, highly available, and provide both file and object interfaces (you probably need an object interface). You can also refer to the following SO question to choose a distributed filesystem.

Best distributed filesystem for commodity linux storage farm

Up to you to implement a cache on top of these scalable storage solutions.

like image 108
Didier Spezia Avatar answered Sep 17 '22 17:09

Didier Spezia