Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MongoDB Aggregate Time Series

I'm using MongoDB to store time series data using a similar structure to "The Document-Oriented Design" explained here: http://blog.mongodb.org/post/65517193370/schema-design-for-time-series-data-in-mongodb

The objective is to query for the top 10 busiest minutes of the day on the whole system. Each document stores 1 hour of data using 60 sub-documents (1 for each minute). Each minute stores various metrics embedded in the "vals" field. The metric I care about is "orders". A sample document looks like this:

{
        "_id" : ObjectId("54d023802b1815b6ef7162a4"),
        "user" : "testUser",
        "hour" : ISODate("2015-01-09T13:00:00Z"),
        "vals" : {
                "0" : {
                        "orders" : 11,
                        "anotherMetric": 15
                },
                "1" : {
                        "orders" : 12,
                        "anotherMetric": 20
                },
                .
                .
                .
        }
}

Note there are many users in the system.

I've managed to flatten the structure (somewhat) by doing an aggregate with the following group object:

group = {
    $group: {
        _id: {
            hour: "$hour"
        },
        0: {$sum: "$vals.0.orders"},
        1: {$sum: "$vals.1.orders"},
        2: {$sum: "$vals.2.orders"},
        .
        .
        .
    }
}

But that just gives me 24 documents (1 for each hour) with the # of orders for each minute during that hour, like so:

{
    "_id" : {
            "hour" : ISODate("2015-01-20T14:00:00Z")
    },
    "0" : 282086,
    "1" : 239358,
    "2" : 289188,
    .
    .
    .
}

Now I need to somehow get the top 10 minutes of the day from this but I'm not sure how. I suspect it can be done with $project, but I'm not sure how.

like image 282
JCB Avatar asked Feb 03 '15 22:02

JCB


People also ask

Is MongoDB good for time series data?

From the very beginning, developers have been using MongoDB to store time-series data. MongoDB can be an extremely efficient engine for storing and processing time-series data, but you'd have to know how to correctly model it to have a performant solution, but that wasn't as straightforward as it could have been.

What is time series Collection in MongoDB?

MongoDB treats time series collections as writable non-materialized views backed by an internal collection. When you insert data, the internal collection automatically organizes time series data into an optimized storage format. When you query time series collections, you operate on one document per measurement.

Is MongoDB aggregate fast?

3, it was seen that MongoDB is typically faster on more complex queries. It's faster from disk when there are no indexes, whereas MySQL is faster from RAM. BI Connector is slower for simple queries and not as fast as hand-crafted aggregation.


1 Answers

You could aggregate as:

  • $match the documents for the specific date.
  • Construct the $group and $project objects before querying.
  • $group by the $hour, accumulate all the documents per hour per minute in an array.Keep the minute somewhere within the document.
  • $project a variable docs as $setUnion of all the documents per hour.
  • $unwind the documents.
  • $sort by orders
  • $limit the top 10 documents which is what we require.

Code:

var inputDate = new ISODate("2015-01-09T13:00:00Z");
var group = {};
var set = [];
for(var i=0;i<=60;i++){
    group[i] = {$push:{"doc":"$vals."+i,
                       "hour":"$_id.hour",
                       "min":{$literal:i}}};
    set.push("$"+i);
}
group["_id"] = {$hour:"$hour"};
var project = {"docs":{$setUnion:set}}

db.t.aggregate([
{$match:{"hour":{$lte:inputDate,$gte:inputDate}}},
{$group:group},
{$project:project},
{$unwind:"$docs"},
{$sort:{"docs.doc.orders":-1}},
{$limit:2},
{$project:{"_id":0,
           "hour":"$_id",
           "doc":"$docs.doc",
           "min":"$docs.min"}}
])
like image 58
BatScream Avatar answered Oct 25 '22 21:10

BatScream