Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Trending 100 million+ rows

Tags:

sql

I have a system which records some measured values every second. What is the best way to store trend data which are values corresponding to a specific second?

1 day = 86.400 seconds
1 month = 2.592.000 seconds

Around 1000 values to keep track of every seconds.

Currently there are 50 tables grouping the trend data for 20 columns each. These tables contain more than 100 million rows.

    TREND_TIME datetime (clustered_index)
    TREND_DATA1 real
    TREND_DATA2 real
    ...
    TREND_DATA20 real
like image 657
kerem Avatar asked Jul 20 '10 08:07

kerem


2 Answers

Have you considered RRDTool - it provides a round robin database, or circular buffer, for time series data. You can store data at whatever interval you like, then define consolidation points and a consolidation function, for example (sum, min, max, avg) for a given period, 1 second, 5 seconds, 2 days, etc. Because it knows what consolidation points you want, it doesn't need to store all the data points once they've been agregated.

Ganglia and Cacti use this under the covers and it's quite easy to use from many languages.

If you do need all the datapoints, consider using it just for the aggregation.

like image 138
Robert Christie Avatar answered Sep 28 '22 08:09

Robert Christie


I would change the data saving approach and instead of saving 'raw' data as values I would save 5-20 minutes of data in an array (Memory, BL side), compress that array using LZ based algorithm and then store the data in the database as binary data. Also, it would be nice to save Max/Min/Avg/etc.. info for that binary chunk.

When you want to process the data you can process the data chunk after chunk and by that you keep a low memory profile for your application. this approach is a little more complex but very scalable in terms of memory/processing.

hope this helps.

like image 36
Gilad Avatar answered Sep 28 '22 06:09

Gilad