Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MongoDB vs. Cassandra vs. MySQL for real-time advertising platform

I'm working on a real-time advertising platform with a heavy emphasis on performance. I've always developed with MySQL, but I'm open to trying something new like MongoDB or Cassandra if significant speed gains can be achieved. I've been reading about both all day, but since both are being rapidly developed, a lot of the information appears somewhat dated.

The main data stored would be entries for each click, incremented rows for views, and information for each campaign (just some basic settings, etc). The speed gains need to be found in inserting clicks, updating view totals, and generating real-time statistic reports. The platform is developed with PHP.

Or maybe none of these?

like image 862
James Simpson Avatar asked May 28 '11 16:05

James Simpson


People also ask

Is Cassandra better than MongoDB?

Cassandra has the ability to create secondary indexes on other columns than the defined primary key. However, Cassandra will not allow filtering on other columns without a secondary index, while in MongoDB, the query language can filter on non-indexed fields as well.

Is Cassandra free for commercial use?

Cassandra is a free and open-source, distributed, wide-column store, NoSQL database management system designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure.

Why is Cassandra faster than MySQL?

Advantages of Cassandra Both horizontal and vertical scalability is an option, as Cassandra uses a linear model for faster responses. Along with scalability, the data storage is flexible. Because it is a NoSQL database, it can deal with structured, unstructured, or semi-structured data.

Is MongoDB more efficient than MySQL?

MongoDB speed debate, MongoDB usually comes out as the winner. MongoDB can accept large amounts of unstructured data much faster than MySQL thanks to slave replication and master replication. Depending on the types of data that you collect, you may benefit significantly from this feature.


1 Answers

There are several ways to achieve this with all of the technologies listed. It is more a question of how you use them. Your ideal solution may use a combination of these, with some consideration for usage patterns. I don't feel that the information out there is that dated because the concepts at play are very fundamental. There may be new NoSQL databases and fixes to existing ones, but your question is primarily architectural.

NoSQL solutions like MongoDB and Cassandra get a lot of attention for their insert performance. People tend to complain about the update/insert performance of relational databases but there are ways to mitigate these issues.

Starting with MySQL you could review O'Reilly's High Performance MySQL, optimise the schema, add more memory perhaps run this on different hardware from the rest of your app (assuming you used MySQL for that), or partition/shard data. Another area to consider is your application. Can you queue inserts and updates at the application level before insertion into the database? This will give you some flexibility and is probably useful in all cases. Depending on how your final schema looks, MySQL will give you some help with extracting the data as long as you are comfortable with SQL. This is a benefit if you need to use 3rd party reporting tools etc.

MongoDB and Cassandra are different beasts. My understanding is that it was easier to add nodes to the latter but this has changed since MongoDB has replication etc built-in. Inserts for both of these platforms are not constrained in the same manner as a relational database. Pulling data out is pretty quick too, and you have a lot of flexibility with data format changes. The tradeoff is that you can't use SQL (a benefit for some) so getting reports out may be trickier. There is nothing to stop you from collecting data in one of these platforms and then importing it into a MySQL database for further analysis.

Based on your requirements there are tools other than NoSQL databases which you should look at such as Flume. These make use of the Hadoop platform which is used extensively for analytics. These may have more flexibility than a database for what you are doing. There is some content from Hadoop World that you might be interested in.

like image 196
Brian Lyttle Avatar answered Sep 21 '22 10:09

Brian Lyttle