Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Analytics - mongodb or cassandra

I'm using mongodb today and i'm really happy with it. I need to find a solution for an event logging solution. The log includes loggins of content imprissions and clicks (like ads system). It's many writes and little reads (mainly for daily reporting). It seems like something like Casandra is better solution then Mongodb which seems better for document oriented data structure. Any thoughts ?

like image 615
Ido Shilon Avatar asked Mar 06 '11 00:03

Ido Shilon


2 Answers

One of the nice things about Cassandra is its support for Hadoop map/reduce, which gives it access to a very robust ecosystem (e.g., Pig) of tools, examples, and so forth.

Depending on data volume and use case, you may also want to take advantage of its expiring columns feature (http://www.datastax.com/dev/blog/whats-new-cassandra-07-expiring-columns).

Gemini also recently open-sourced its Cassandra real-time log processing tool, which may be similar to what you want (http://www.thestreet.com/story/11030367/1/gemini-releases-real-time-log-processing-based-on-flume-and-cassandra.html, https://github.com/geminitech/logprocessing).

like image 66
jbellis Avatar answered Oct 21 '22 05:10

jbellis


We have used mongodb in the one of the projects to capture event logging for a distributed app. It works really well and it makes sense to do some calculations beforehand about the amount of storage, sharding and other factors.

As a suggestion, go with capped collection and have a mapreduce operation run every 24 hours or so to reduce the logs to an aggregate table of wanted value. I have noticed, that due to being "schema-less" the documents in mongodb can cause the db file size to grow really fast.

like image 36
Ankur Chauhan Avatar answered Oct 21 '22 05:10

Ankur Chauhan