Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Time-series data in JSON

I need to model 1,000,000+ data points in JSON. I am thinking of two ways of doing this:

a) Array of objects:

[{time:123456789,value:1432423},{time:123456790,value:1432424},....]

or

b) Nested arrays

[[123456789,1432423],[123456790,1432424],....]

Naively comparing these two approaches, it feels like the latter is faster because it uses less characters but less descriptive. Is b really faster than a ? Which one would you choose and why ?

Is there a 3rd approach ?

like image 805
Ali Salehi Avatar asked May 12 '15 11:05

Ali Salehi


2 Answers

{time:[123456789,123456790,...], value:[1432423,1432424,...]}

why?

  1. iterating over a primitive array is faster.
  2. comparable to "JSON size" with b) but you will not lose the "column" information

this npm could be of interest: https://github.com/michaelwittig/fliptable

like image 156
hellomichibye Avatar answered Oct 10 '22 01:10

hellomichibye


If your time series data models some continuous function, especially over regular time intervals, there could be much more efficient representation with delta compression, even if you are still using JSON:

[
    {time:10001,value:12345},
    {time:10002,value:12354},
    {time:10003,value:12354},
    {time:10010,value:12352}
]

Can be represented as:

[[10001,1,1,7],[12345,9,,-2]]

Which is a 4 times shorter representation.

The original could be reconstructed with:

[{time:a[0][0],value:a[1][0]},{time:a[0][0] + a[0][1]||1, value: a[1][0] + a[1][1]||0 ...
like image 35
George Polevoy Avatar answered Oct 10 '22 03:10

George Polevoy