Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MongoDB vs. Redis vs. Cassandra for a fast-write, temporary row storage solution

I'm building a system that tracks and verifies ad impressions and clicks. This means that there are a lot of insert commands (about 90/second average, peaking at 250) and some read operations, but the focus is on performance and making it blazing-fast.

The system is currently on MongoDB, but I've been introduced to Cassandra and Redis since then. Would it be a good idea to go to one of these two solutions, rather than stay on MongoDB? Why or why not?

Thank you

like image 552
Mark Bao Avatar asked Jun 09 '10 21:06

Mark Bao


People also ask

Which is faster Redis or MongoDB?

Redis vs MongoDB Speed MongoDB is schemaless, which means that the database does not have a fixed data structure. This means that as the data stored in the database gets larger and larger, MongoDB is able to operate much faster than Redis.

Is Cassandra faster than MongoDB?

The system provides high-scalable and linear performance. Writing scalability is limited. High scalability. In terms of reading operations, MongoDB is not faster than Cassandra.

Which is faster Redis or Cassandra?

Because Redis stores voluminous data in memory, its transactional response times are much faster than Cassandra that persists data to disk by performing traditional read-write transactions, albeit much quicker than a conventional RDBMS.

Why Cassandra is faster than MongoDB?

Since Cassandra has multi-primary node support, the architectural design of Cassandra enables it to handle many simultaneous writes to more than one node. It will be more write performant than MongoDB which is limited to one writable primary node per replica set. Secondary servers can only be used for reads.


1 Answers

For a harvesting solution like this, I would recommend a multi-stage approach. Redis is good at real time communication. Redis is designed as an in-memory key/value store and inherits some very nice benefits of being a memory database: O(1) list operations. For as long as there is RAM to use on a server, Redis will not slow down pushing to the end of your lists which is good when you need to insert items at such an extreme rate. Unfortunately, Redis can't operate with data sets larger than the amount of RAM you have (it only writes to disk, reading is for restarting the server or in case of a system crash) and scaling has to be done by you and your application. (A common way is to spread keys across numerous servers, which is implemented by some Redis drivers especially those for Ruby on Rails.) Redis also has support for simple publish/subscribe messenging, which can be useful at times as well.

In this scenario, Redis is "stage one." For each specific type of event you create a list in Redis with a unique name; for example we have "page viewed" and "link clicked." For simplicity we want to make sure the data in each list is the same structure; link clicked may have a user token, link name and URL, while the page viewed may only have the user token and URL. Your first concern is just getting the fact it happened and whatever absolutely neccesary data you need is pushed.

Next we have some simple processing workers that take this frantically inserted information off of Redis' hands, by asking it to take an item off the end of the list and hand it over. The worker can make any adjustments/deduplication/ID lookups needed to properly file the data and hand it off to a more permanent storage site. Fire up as many of these workers as you need to keep Redis' memory load bearable. You could write the workers in anything you wish (Node.js, C#, Java, ...) as long as it has a Redis driver (most web languages do now) and one for your desired storage (SQL, Mongo, etc.)

MongoDB is good at document storage. Unlike Redis it is able to deal with databases larger than RAM and it supports sharding/replication on it's own. An advantage of MongoDB over SQL-based options is that you don't have to have a predetermined schema, you're free to change the way data is stored however you want at any time.

I would, however, suggest Redis or Mongo for the "step one" phase of holding data for processing and use a traditional SQL setup (Postgres or MSSQL, perhaps) to store post-processed data. Tracking client behavior sounds like relational data to me, since you may want to go "Show me everyone who views this page" or "How many pages did this person view on this given day" or "What day had the most viewers in total?". There may be even more complex joins or queries for analytic purposes you come up with, and mature SQL solutions can do a lot of this filtering for you; NoSQL (Mongo or Redis specifically) can't do joins or complex queries across varied sets of data.

like image 161
Skrylar Avatar answered Oct 13 '22 23:10

Skrylar