I would really appreciate if somebody put some light on the choice of HBase as a data storage engine for OpenTSDB?
Which other choices, such as Whisper (Graphite front-end + Carbon persistence), were considered?
How is a column-oriented db such as HBase a better choice for time-series data?
Time-series applications (sensor data, application/system logging events, user interactions etc) present a new set of data storage challenges: very high velocity and very high volume of data. This talk will present the recent development in Apache HBase that make it a good fit for time-series applications.
Storing time series data. Time series data is best stored in a time series database (TSDB) built specifically for handling metrics and events that are time-stamped. This is because time series data is often ingested in massive volumes that require a purpose-built database designed to handle that scale.
I chose HBase because it scales. Whisper is much like RRD, it's a fixed-size database, it must destroy data in order to work within its space constraints. HBase offers the following properties that make it very well suited for large scale time series databases:
The fact that HBase is column oriented wasn't nearly as important a consideration as the fact that it's a big sorted key-value system that really scales.
All RRD-based and RRD-derived tools couldn't satisfy the scale requirements of being able to accurately store billions and billions of data points forever for very cheap (just a few bytes of actual disk space per data point).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With