Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pick database for ads/analytics service

Now I have a project with ads exchange service (something like google double click) and I have to pick a high-scalable database. I'm thinking about mongodb or cassandra.

Cassandra:

  • fit with our write-intensive system. (+)
  • looks hard to do aggregate(very important for analytics) (is there a good way? Just read slide about Twitter rainbird, seem good) (?)
  • I dont prefer java much. (-)

MongoDB:

  • Seem easier to do analytics. (have build-in aggregate functions) (+)
  • more RAM-consuming? (because of document-oriented vs key-value Cassandra) (?)
  • write perfomance compare to Cassandra? (?)
  • javascript shell and natural fit with node.js(one important part in our project) (+)
  • http://pastebin.com/raw.php?i=FD3xe6Jt - This article make me cautious. (-)

Can you guys help me to pick the one or answer some of my questions above

Thanks.

like image 254
Yoshi Avatar asked May 04 '26 07:05

Yoshi


2 Answers

I don't know about Cassandra, but MongoDB has some advantages for using it for analytics: high concurrency, sharding, storing everything about an event in a single document, features like upsert and $inc.

For more detailed explanations check the following resources:

MongoDB Analytics - videos
http://blog.mongodb.org/post/171353301/using-mongodb-for-real-time-analytics
http://www.mongodb.org/display/DOCS/Use+Cases
http://www.slideshare.net/jrosoff/scalable-event-analytics-with-mongodb-ruby-on-rails
http://nosql.mypopescu.com/post/3508305955/fast-asynchronous-analytics-with-mongodb
http://blog.opengovernment.org/2011/02/24/fast-asynchronous-analytics-with-mongodb/
http://blog.10gen.com/post/4416876632/london-startup-ubervu-on-storing-5tb-of-data-in-mongodb

like image 72
alessioalex Avatar answered May 05 '26 19:05

alessioalex


It depends a lot on your domain, most cases one would probably choose Mongo.
For example http://square.github.com/cube/ is built on Mongo.

Cube is an open-source system for visualizing time series data, built on MongoDB, Node and D3. If you send Cube timestamped events (with optional structured data), you can easily build realtime visualizations of aggregate metrics for internal dashboards. For example, you might use Cube to monitor traffic to your website, counting the number of requests in 5-minute intervals:

Most use cases of Cassandra draw from the need oh high availability that's the main feature of it afaik. Your needs seem to be centered around having a cheap way to shove queryable data in a scale-out DB, and mongo almost matches RDBMS in regards to querying. Mongo is also probably easier to deal with.

like image 31
clyfe Avatar answered May 05 '26 20:05

clyfe



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!