Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Best way to store Time series data with heavy writing and high aggregation. (~1 billion points)

I'm looking for a way to store data with a timestamp.

Each timestamp might have 1 to 10 data fields.

Can I store data as (time, key, value) using a simple data solution or SQL? how would that compare to noSQL solution like mongo, where I can store {time:.., key1:..., key2:...}?

It will store about 10 data points with max around 10 fields per second. And the data might be collected as long as 10years, easily aggregating a billion records. The database should be able to help graphing data with time range queries.

It should be able to handle heavy writing frequency, ~100 per second (ok, this is not that high, but still..), at the same time being able to handle queries that return about a million of records (maybe even more)

Data it self is very simple, they are just electronic measurements. Some need to be measured with a high frequency(~100 milliseconds), and others every 1 min or so.

Can anyone who used something like this comment on the pluses and minuses of the method they used?

(Obviously this is a very specific scenario, so this definitely is not intended to turn in to what's the best database kind of question).

Sample data:

{ _id: Date(2013-05-08 18:48:40.078554),
  V_in: 2.44,
  I_in: .00988,
  I_max: 0.11,
},

{_id: Date(2013-05-08 18:48:40.078325),
  I_max: 0.100,
},

{ _id: Date(2001-08-09 23:48:43.083454),
  V_out: 2.44,
  I_in: .00988,
  I_max: 0.11,
},

Thank you.

like image 658
xcorat Avatar asked May 08 '13 19:05

xcorat


1 Answers

For simplicity, I would just make a table of timestamps with a column for each measurement point, and an integer primary key would be technically redundant since the timestamp uniquely identifies a measurement point, however it's easier to refer to a particular row by number than by timestamp. You will have nulls for any measured parameter that was not taken during that timestamp, which will take up a few extra bits per row (log base 2 of number of columns, rounded up), but you also won't have to do any joins. It is true if you decide you want to add columns later, but that's really not too difficult, and you could just make another separate table that keys on this one.

Please see here for an example with your data: http://www.sqlfiddle.com/#!2/e967c/4

I would recommend making some dummy databases of large size to make sure whatever structure you use still performs adequately.

The (time,key,value) suggestion smells like EAV, which I would avoid if you're planning on scaling.

like image 176
engineerC Avatar answered Nov 07 '22 04:11

engineerC