Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Free data warehousing systems--specifically, for data storage

I am building out some reporting stuff for our website (a decent sized site that gets several million pageviews a day), and am wondering if there are any good free/open source data warehousing systems out there.

Specifically, I am looking for only something to store the data--I plan to build a custom front end/UI to it so that it shows the information we care about. However, I don't want to have to build a customized database for this, and while I'm pretty sure an SQL database would not work here, I'm not sure what to use exactly. Any pointers to helpful articles would also be appreciated.

Edit: I should mention--one DB I have looked at briefly was MongoDB. It seems like it might work, but their "Use Cases" specifically mention data warehousing as "Less Well Suited": http://www.mongodb.org/display/DOCS/Use+Cases . Also, it doesn't seem to be specifically targeted towards data warehousing.

like image 604
Sam Lee Avatar asked Dec 07 '22 05:12

Sam Lee


2 Answers

http://www.hypertable.org/ might be what you are looking for is (and I'm going by your descriptions above here) something to store large amounts of logged data with normalization. i.e. a visitor log.

Hypertable is based on google's bigTable project. see http://code.google.com/p/hypertable/wiki/PerformanceTestAOLQueryLog for benchmarks

you lose the relational capabilities of SQL based dbs but you gain a lot in performance. you could easily use hypertable to store millions of rows per hour (hard drive space withstanding).

hope that helps

like image 57
Mike Valstar Avatar answered Dec 11 '22 10:12

Mike Valstar


I may not understand the problem correctly -- however, if you find some time to (re)visit Kimball’s “The Data Warehouse Toolkit”, you will find that all it takes for a basic DW is a plain-vanilla SQL database, in other words you could build a decent DW with MySQL using MyISAM for the storage engine. The question is only in desired granularity of information – what you want to keep and for how long. If your reports are mostly periodic, and you implement a report storage or cache, than you don’t need to store pre-calculated aggregations (no need for cubes). In other words, Kimball star with cached reporting can provide decent performance in many cases. You could also look at the community edition of “Pentaho BI Suite” (open source) to get a quick start with ETL, analytics and reporting -- and experiment a bit to evaluate the performance before diving into custom development. Although this may not be what you were expecting, it may be worth considering.

like image 27
Damir Sudarevic Avatar answered Dec 11 '22 10:12

Damir Sudarevic