After analyzing some gigabytes of logfiles with grep and the like I was wondering how to make this easier by using a database to log the stuff into. What database would be appropiate for this purpuse? A vanillia SQL database works, of course, but provides lots of transactional guarantees etc. which you don't need here, and which might make it slow if you work with gigabytes of data and very fast insertion rates. So a NoSQL database that could be the right answer (compare this answer for some suggestions). Some requirements for the database would be:
Update: There are already some SO-questions for this: Database suggestion for processing/reporting on large amount of log file type data and What are good NoSQL and non-relational database solutions for audit/logging database . However, I am curious which databases fulfill which requirements.
After having tried a lot of nosql solutions, my best bets would be:
Riak + Riak Search scale easily (REALLY!) and allow you free form queries over your data. You can also easily mix data schemas and maybe even compress data with innostore as a backend.
MongoDB is annoying to scale over several gigabytes of data if you really want to use indexes and not slow down to a crawl. It is really fast considering single node performance and offers index creation. As soon as your working data set doesn't fit in memory anymore, it becomes a problem...
mysql/postgresql is still pretty fast and allows free form queries thanks to the usual b+tree indexes. Look at postgres for partial indexes if some of the fields don't show up in every record. They also offer compressed tables and since the schema is fixed, you don't save your row names over and over again (that's what usually happens for a lot of the nosql solutions)
CouchDB is nice if you already know the queries you want to see, their incremental map/reduce based views are a great system for that.
There are a lot of different options that you could look into. You could use Hive for your analytics and Flume to consume and load the log files. MongoDB might also be a good option for you, take a look at this article on log analytics with MongoDB, Ruby, and Google Charts
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With