Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

key-value store for time series data?

I've been using SQL Server to store historical time series data for a couple hundred thousand objects, observed about 100 times per day. I'm finding that queries (give me all values for object XYZ between time t1 and time t2) are too slow (for my needs, slow is more then a second). I'm indexing by timestamp and object ID.

I've entertained the thought of using somethings a key-value store like MongoDB instead, but I'm not sure if this is an "appropriate" use of this sort of thing, and I couldn't find any mentions of using such a database for time series data. ideally, I'd be able to do the following queries:

  • retrieve all the data for object XYZ between time t1 and time t2
  • do the above, but return one date point per day (first, last, closed to time t...)
  • retrieve all data for all objects for a particular timestamp

the data should be ordered, and ideally it should be fast to write new data as well as update existing data.

it seems like my desire to query by object ID as well as by timestamp might necessitate having two copies of the database indexed in different ways to get optimal performance...anyone have any experience building a system like this, with a key-value store, or HDF5, or something else? or is this totally doable in SQL Server and I'm just not doing it right?

like image 901
toasteroven Avatar asked Nov 05 '09 21:11

toasteroven


People also ask

What is the best way to store time series data?

Storing time series data. Time series data is best stored in a time series database (TSDB) built specifically for handling metrics and events that are time-stamped. This is because time series data is often ingested in massive volumes that require a purpose-built database designed to handle that scale.

What database is used for time series data?

Relational database management systems (RDBS), which are often considered general-purpose database systems, can be used to store and retrieve time series data.

Where do we store data as key-value?

A key-value database is a type of nonrelational database that uses a simple key-value method to store data. A key-value database stores data as a collection of key-value pairs in which a key serves as a unique identifier. Both keys and values can be anything, ranging from simple objects to complex compound objects.

Is MongoDB good for time series data?

From the very beginning, developers have been using MongoDB to store time-series data. MongoDB can be an extremely efficient engine for storing and processing time-series data, but you'd have to know how to correctly model it to have a performant solution, but that wasn't as straightforward as it could have been.


1 Answers

It sounds like MongoDB would be a very good fit. Updates and inserts are super fast, so you might want to create a document for every event, such as:

{
   object: XYZ,
   ts : new Date()
}

Then you can index the ts field and queries will also be fast. (By the way, you can create multiple indexes on a single database.)

How to do your three queries:

retrieve all the data for object XYZ between time t1 and time t2

db.data.find({object : XYZ, ts : {$gt : t1, $lt : t2}})

do the above, but return one date point per day (first, last, closed to time t...)

// first
db.data.find({object : XYZ, ts : {$gt : new Date(/* start of day */)}}).sort({ts : 1}).limit(1)
// last
db.data.find({object : XYZ, ts : {$lt : new Date(/* end of day */)}}).sort({ts : -1}).limit(1)

For closest to some time, you'd probably need a custom JavaScript function, but it's doable.

retrieve all data for all objects for a particular timestamp

db.data.find({ts : timestamp})

Feel free to ask on the user list if you have any questions, someone else might be able to think of an easier way of getting closest-to-a-time events.

like image 113
kristina Avatar answered Nov 13 '22 08:11

kristina