I'm using mongodb today and i'm really happy with it. I need to find a solution for an event logging solution. The log includes loggins of content imprissions and clicks (like ads system). It's many writes and little reads (mainly for daily reporting). It seems like something like Casandra is better solution then Mongodb which seems better for document oriented data structure. Any thoughts ?
One of the nice things about Cassandra is its support for Hadoop map/reduce, which gives it access to a very robust ecosystem (e.g., Pig) of tools, examples, and so forth.
Depending on data volume and use case, you may also want to take advantage of its expiring columns feature (http://www.datastax.com/dev/blog/whats-new-cassandra-07-expiring-columns).
Gemini also recently open-sourced its Cassandra real-time log processing tool, which may be similar to what you want (http://www.thestreet.com/story/11030367/1/gemini-releases-real-time-log-processing-based-on-flume-and-cassandra.html, https://github.com/geminitech/logprocessing).
We have used mongodb in the one of the projects to capture event logging for a distributed app. It works really well and it makes sense to do some calculations beforehand about the amount of storage, sharding and other factors.
As a suggestion, go with capped collection and have a mapreduce operation run every 24 hours or so to reduce the logs to an aggregate table of wanted value. I have noticed, that due to being "schema-less" the documents in mongodb can cause the db file size to grow really fast.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With