Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What database would you use for logging (i.e. als logfile replacement)

After analyzing some gigabytes of logfiles with grep and the like I was wondering how to make this easier by using a database to log the stuff into. What database would be appropiate for this purpuse? A vanillia SQL database works, of course, but provides lots of transactional guarantees etc. which you don't need here, and which might make it slow if you work with gigabytes of data and very fast insertion rates. So a NoSQL database that could be the right answer (compare this answer for some suggestions). Some requirements for the database would be:

  • Ability to cope with gigabytes or maybe even terabytes of data
  • Fast insertion
  • Multiple indizes on each entry should be possible (e.g. time, session id, URL etc.)
  • If possible, it store the data in a compressed form, since logfiles are usually extremely repetitive.

Update: There are already some SO-questions for this: Database suggestion for processing/reporting on large amount of log file type data and What are good NoSQL and non-relational database solutions for audit/logging database . However, I am curious which databases fulfill which requirements.

like image 671
Hans-Peter Störr Avatar asked Nov 25 '10 16:11

Hans-Peter Störr


2 Answers

After having tried a lot of nosql solutions, my best bets would be:

  • riak + riak search for great scalability
  • unnormalized data in mysql/postgresql
  • mongoDB if you don't mind waiting
  • couchdb if you KNOW what you're searching for

Riak + Riak Search scale easily (REALLY!) and allow you free form queries over your data. You can also easily mix data schemas and maybe even compress data with innostore as a backend.

MongoDB is annoying to scale over several gigabytes of data if you really want to use indexes and not slow down to a crawl. It is really fast considering single node performance and offers index creation. As soon as your working data set doesn't fit in memory anymore, it becomes a problem...

mysql/postgresql is still pretty fast and allows free form queries thanks to the usual b+tree indexes. Look at postgres for partial indexes if some of the fields don't show up in every record. They also offer compressed tables and since the schema is fixed, you don't save your row names over and over again (that's what usually happens for a lot of the nosql solutions)

CouchDB is nice if you already know the queries you want to see, their incremental map/reduce based views are a great system for that.

like image 174
Marc Seeger Avatar answered Sep 19 '22 05:09

Marc Seeger


There are a lot of different options that you could look into. You could use Hive for your analytics and Flume to consume and load the log files. MongoDB might also be a good option for you, take a look at this article on log analytics with MongoDB, Ruby, and Google Charts

like image 24
Jeremiah Peschka Avatar answered Sep 18 '22 05:09

Jeremiah Peschka