Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Best solution for a caching system that supports sharding, replication and has low latency

We're in the process of deploying a highly dynamic website. About 20,000 items are processed and updated every minute at peak capacity. Each item can range from a size of 1kb to 500kb. These items needs to be retrieved, processed and updated in cache every minute.

We are expecting a traffic of upto 1000 users in the first two-three months. As each user lands on the website, they can be requesting some popular content, but others may request unpopular content. All content is a higher level processed form of what sits in the persistent store. Hence it is absolutely necessary to have all the processed items sitting in a low-latency store for superb user-experience, be it popular or unpopular.

We've tried Memcache, Redis and Couchbase separately.

Memcache is super fast but we ran into issues where certain slabs ran out of memory and active items started getting evicted.

Redis, relatively slower than Memcache, is great if you want persistence in the items.

However soon we realized we wanted sharding and replication.

Couchbase offered that out of the box.. The Moxi-client that interfaces with the Couchbase server has its own problems of not being able to handle heavy concurrent processes. It will start missing sets and gets every now and then. Moved over to the Python SDK that interfaces with it. It performed poorly in the event when one of the nodes in the cluster went down, it wasn't able to discover the new topology at all. Ended up losing some data in cache and inactivity on the site for several precious hours.

At a point where we realize that there is no perfect product out there that will suite our needs. You have to be aware of all the technologies and your own needs. You have to foresee how your data will evolve and be prepared accordingly. The best solution is probably a hybrid of technologies. However putting this out in the hope that maybe there is something other there. We're approaching the end of 2012. How hard can it be for an out of the box solution backed up with powerful hardware to deliver what we need.

Any thoughts and links to insightful articles would be greatly appreciated. Thanks!

like image 971
Shah W Avatar asked Oct 26 '12 00:10

Shah W


People also ask

What is sharding in cache?

Sharding is the practice of optimizing database management systems by separating the rows or columns of a larger database table into multiple smaller tables.

What is better than Redis?

Memcached, MongoDB, RabbitMQ, Hazelcast, and Cassandra are the most popular alternatives and competitors to Redis.

What is the difference between Redis and Memcached?

Although they are both easy to use and offer high performance, there are important differences to consider when choosing an engine. Memcached is designed for simplicity while Redis offers a rich set of features that make it effective for a wide range of use cases.

What is Redis caching?

Amazon ElastiCache for Redis is a fully managed caching service that makes it easy to set up, operate, and scale a cache in the cloud. With ElastiCache for Redis, you can accelerate application speed and unlock microsecond read and write latency by caching data from primary databases and data stores.


1 Answers

Here are a few notes about some of the technologies you have mentioned above.

Memcached:

Memcached is only a caching system and will not provide you with any data persistence. If you choose to use memcached then you will need to choose some other type of persistent store to keep all of you data. Memcached is also a very simple caching system and does not provide you with replication, but their are different project (like repcache) that have added features like this to memcached. I would only use memcached if I wanted to use a relational database as my persistence layer.

Redis:

Redis is a data structure server and should only be used for that purpose. The downside to Redis is that you can only run it on a single server and if you want to have multiple servers of Redis then you need to do application sharding. Most of the deployments of Redis I have seen are along side another database technology.

Couchbase:

Couchbase 2.0 will will turn the product into a document database. The product has memcached technology inside it so you get memcached out of the box which means sub-millisecond latencies. On top of this you get replication, cross data center replication and querying support. Also, note that most Couchbase SDK's don't use moxi and that the Python SDK is still in beta.

One thing that might be useful for you to do is to check out the YCSB benchmarking project along with some of the results that have already been published. This project will allow you to get a good idea of how these and other databases perform under load. Then once you find some you like you can look through their feature list and figure out with product has the features that best fits the application your developing.

Also, if any of my information about the databases above is incorrect please let me know. These projects are evolving quickly and sometimes it's hard to keep up.

EDIT: I should also mention that Couchbase is the only databases out of the ones listed that provide replication, sharding, and low latency. I imagine redis will allow you to have a replica server and therefore replication, but any sharding you do will have to be done at the application layer.

like image 126
mikewied Avatar answered Sep 28 '22 07:09

mikewied