I need to store a bunch of time series in a database but I'm concerned about both the size and the processing time.
To reduce size I've already used, in another project, zipped/JSON to store a whole time series, and this is quite efficient in term of storage space. But the issue is that to search some data you have to first retrieve the whole time series, unzip it and unserialize it and of course you can't use the database integrated querying capabilities like SQL SELECT/WHERE.
So you consume bandwidth to get the data, CPU to unzip, RAM to store even if you need only one point...
This was not an issue for the previous project because time series were always manipulated as a whole, essentially to be displayed in charts or Excel, but this time I'd like to have a minimal ability to search data in database.
To allow this flexibility in term of data manipulation, e.g. using SQL, there is the "standard format": one row by date, but I have two concerns:
I can choose any free database, so NoSQL is welcome too if it can help.
Have you any suggestions, or better some feedbacks?
Thanks for any input.
Checkout TempoDB: http://tempo-db.com
I'm a co-founder, and we built the service to solve this exact problem.
The access pattern is writing data in order by time, usually not editing it (highly immutable), and then reading data back by time.
The fundamental issue you'll face is indexing on a timestamp, where there are many billions of rows. You want to decouple query performance from the underlying total dataset size, which will always be growing at least linearly. We do all that stuff... and more :)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With