Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Mongodb vs Cassandra for aggregating, searching and analyzing many logs

I'm working on a project that does log aggregation and analytics as part of a bigger project. I don't know which database to choose for handling these logs. Lately I'm going back and forth between MongoDB and Cassandra, but I'm sure there are others that fit my needs as well. Which one should I choose and why?

The whole thing is quite at the beginning right now, but here are the requirements so far:

  • logs are in the syslog format
  • queries are mostly on a small string that's now in the message, but I will get it on a separate field. And there will also be filters based on date, severity or tag. Very rarely, people would just search for a random string within the message.
  • hourly analytics from some of the log entries
  • keep the logs for a configurable amount of time
  • more will come, I'm sure :) That's why I'm thinking NoSQL is more appropriate, because we can change the schema.

We are expecting to grow the database to some TB of data (and ~50K inserts per second), so sharding is a must. Queries are not so often, because they are mainly used by the developers of the bigger project. But a result needs to be returned in a few seconds.

Right now, the storage is common (and slow) for all the machines. So for scalability, I suppose we need to make best use of memory and multithreading - in order for sharding to make sense.

The basic ideas I got so far is that MongoDB has more features, such as regex or sorting results, and it's easier setup to a decent configuration, while Cassandra seems more scalable (by simply adding servers), and also has a few neat features, like putting a TTL on data.

like image 611
Radu Gheorghe Avatar asked Dec 31 '11 16:12

Radu Gheorghe


1 Answers

Sparsely columnar datastores such as Apache Cassandra are excellent at aggregating time series data. See the following articles for examples:

  • Basic time series with Cassandra
  • 4 Months with Cassandra
  • Understanding the Cassandra data model
like image 144
zznate Avatar answered Oct 04 '22 16:10

zznate