Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MongoDB as a Time Series Database

I'm trying to use mongodb for a time series database and was wondering if anyone could suggest how best to set it up for that scenario.

The time series data is very similar to a stock price history. I have a collection of data from a variety of sensors taken from different machines. There are values at billion's of timestamps and I would like to ask the following questions (preferably from the database rather than the application level):

  1. For a given set of sensors and time interval, I want all the timestamps and sensor values that lie within that interval in order by time. Assume all the sensors share the same timestamps (they were all sampled at the same time).

  2. For a given set of sensors and time interval, I want every kth item (timestamp, and corresponding sensor values) that lie within the given interval in order by time.

Any recommendation on how to best set this up and achieve the queries?

Thanks for the suggestions.

like image 874
sequoia Avatar asked Sep 10 '11 00:09

sequoia


People also ask

Is MongoDB good for time series data?

From the very beginning, developers have been using MongoDB to store time-series data. MongoDB can be an extremely efficient engine for storing and processing time-series data, but you'd have to know how to correctly model it to have a performant solution, but that wasn't as straightforward as it could have been.

Can MongoDB be used as realtime database?

You will need MongoDB 3.6+ and Node. js 6+ installed on your machine. You should have some knowledge of Node and React, and a basic understanding of MongoDB management tasks.

What is time series Collection in MongoDB?

MongoDB treats time series collections as writable non-materialized views backed by an internal collection. When you insert data, the internal collection automatically organizes time series data into an optimized storage format. When you query time series collections, you operate on one document per measurement.

Which database is best for time series data?

InfluxDB is an open-source database. This is, by far, the most popular and most used time-series database in the world.


1 Answers

Obviously this is an old question, but I came across it when I was researching MongoDB for timeseries data. I thought that it might be worth sharing the following approach for allocating complete documents in advance and performing update operations, as opposed to new insert operations. Note, this approach was documented here and here.

Imagine you are storing data every minute. Consider the following document structure:

{   timestamp: ISODate("2013-10-10T23:06:37.000Z"),   type: ”spot_EURUSD”,   value: 1.2345 }, {   timestamp: ISODate("2013-10-10T23:06:38.000Z"),   type: ”spot_EURUSD”,   value: 1.2346 } 

This is comparable to a standard relational approach. In this case, you produce one document per value recorded, which causes a lot of insert operations. We can do better. Consider the following:

{   timestamp_minute: ISODate("2013-10-10T23:06:00.000Z"),   type: “spot_EURUSD”,   values: {     0: 1.2345,     …       37: 1.2346,     38: 1.2347,     …      59: 1.2343   } } 

Now, we can write one document, and perform 59 updates. This is much better because updates are atomic, individual writes are smaller, and there are other performance and concurrency benefits. But what if we wanted to store the entire day, and not just the entire hours, in one document. This would then require us to walk along 1440 entries to get the last value. To improve on this, we can extend further to the following:

{   timestamp_hour: ISODate("2013-10-10T23:00:00.000Z"),   type: “spot_EURUSD”,   values: {     0: { 0: 1.2343, 1: 1.2343, …, 59: 1.2343},     1: { 0: 1.2343, 1: 1.2343, …, 59: 1.2343},     …,     22: { 0: 1.2343, 1: 1.2343, …, 59: 1.2343},     23: { 0: 1.2343, 1: 1.2343, …, 59: 1.2343}   } } 

Using this nested approach, we now only have to walk, at maximum, 24 + 60 to get the very last value in the day.

If we build the documents with all the values filled-in with padding in advance, we can be sure that the document will not change size and therefore will not be moved.

like image 110
jtromans Avatar answered Oct 18 '22 04:10

jtromans