Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Mongo DB - usage of map reduce or aggregation

I have series of documents in MongoDB collection that looks like this:

{ 'time' : '2016-03-28 12:12:00', 'value' : 90 },
{ 'time' : '2016-03-28 12:13:00', 'value' : 82 },
{ 'time' : '2016-03-28 12:14:00', 'value' : 75 },
{ 'time' : '2016-03-28 12:15:00', 'value' : 72 },
{ 'time' : '2016-03-28 12:16:00', 'value' : 81 },
{ 'time' : '2016-03-28 12:17:00', 'value' : 90 },
etc....

The tasks is - with trash hold value of 80 find all times where value is entering below 80 and exiting above 80

{ 'time' : '2016-03-28 12:14:00', 'result' : 'enter' },
{ 'time' : '2016-03-28 12:16:00', 'result' : 'exit' },

Is it way to have map reduce or aggregation query that would produce such result ? I was trying to loop thru sorted results, but it is very processing and memory expensive - I need to do series of such checks.

PS. I am using Django and mongoengine to execute call.

like image 346
bensiu Avatar asked Mar 26 '16 03:03

bensiu


People also ask

What is the difference between mapReduce and aggregation in MongoDB?

MapReduce of MongoDB is based on JavaScript using the SpiderMonkey engine and the queries are executed in a single thread. On the other hand, Aggregation Pipeline queries run on compiled C++ code which makes them faster as it is not interpreted like JavaScript.

Can mapReduce be used for aggregation?

Map-reduce operations can be rewritten using aggregation pipeline operators, such as $group , $merge , and others. For map-reduce operations that require custom functionality, MongoDB provides the $accumulator and $function aggregation operators starting in version 4.4.

Does MongoDB use mapReduce?

As per the MongoDB documentation, Map-reduce is a data processing paradigm for condensing large volumes of data into useful aggregated results. MongoDB uses mapReduce command for map-reduce operations.

When should I use aggregation in MongoDB?

You can use aggregation operations to: Group values from multiple documents together. Perform operations on the grouped data to return a single result. Analyze data changes over time.


2 Answers

I'm not sure this is possible with the MongoDB aggregation framework alone since, as mentioned by @BlakesSeven, there is no link/connection between the subsequent documents. And you need this connection to check if the new value went below or above the desired threshold comparing to what the value was right before it, in a previous document.

Here is a naive pure-python (since it is tagged with Django and MongoEngine) solution that loops over the sorted results maintaining the threshold-track variable and catching when it goes lower or higher 80 (col is your collection reference):

THRESHOLD = 80
cursor = col.find().sort("time")

first_value = next(cursor)
more_than = first_value["value"] >= THRESHOLD

for document in cursor:
    if document["value"] < THRESHOLD:
        if more_than:
            print({"time": document["time"], "result": "enter"})
        more_than = False
    else:
        if not more_than:
            print({"time": document["time"], "result": "exit"})
        more_than = True

For the provided sample data, it prints:

{'time': '2016-03-28 12:14:00', 'result': 'enter'}
{'time': '2016-03-28 12:16:00', 'result': 'exit'}

As a side note and an alternative solution..if you have control over the how these records are inserted, when you insert a document into this collection, you may check what is the latest value, compare it to the threshold and set the result as a separate field. Then, querying the entering and exiting the threshold points would become as easy as:

col.find({"result" : {$exists : true}})

You can name this approach as "marking the threshold values beforehand". This probably makes sense only from querying/searching performance perspective and if you are going to do this often.

like image 195
alecxe Avatar answered Nov 06 '22 03:11

alecxe


You can achieve transformation of documents easily with help of aggregation framework and cursor iteration.

Example:

db.collection.aggregate([
  {$project:
    {
      value:1,
      "threshold":{$let:
        {
          vars: {threshold: 80 }, 
          in:   "$$threshold"
        }}
     }
  },
  {$match:{value:{$ne: "$threshold"}}},
  {$group:
     {
       _id:"$null", 
       low:{
         $max:{
             $cond:[{$lt:["$value","$threshold"]},"$value",-1]
          }
       },

       high:{
         $min:{
             // 10000000000 is a superficial value. 
             // need something greater than values in documents
             $cond:[{$gt:["$value","$threshold"]},"$value",10000000000] 
          }
       },

       threshold:{$first:"$threshold"}
     }
   }  
])

Aggregation framework will return a document with two values.

{ 
    "_id" : null, 
    "low" : NumberInt(75), 
    "high" : NumberInt(81), 
    "threshold" : NumberInt(80)
}

We can easily find documents matching return criteria. e.g. in NodeJS we can easily do this. assuming variable result holds result from aggregation query.

result.forEach(function(r){

   var documents = [];

   db.collection.find({$or:[{"value": r.low},{"value": r.high}]}).forEach(function(doc){

        var _doc = {};
        _doc.time = doc.time;
        _doc.result = doc.value < r.threshold ? "enter" : "exit";
        documents.push(_doc);
   });
   printjson(documents);
});

As you mention, if your input documents are (sample)

{ 'time' : '2016-03-28 12:12:00', 'value' : 90 },
{ 'time' : '2016-03-28 12:13:00', 'value' : 82 },
{ 'time' : '2016-03-28 12:14:00', 'value' : 75 },
{ 'time' : '2016-03-28 12:15:00', 'value' : 72 },
{ 'time' : '2016-03-28 12:16:00', 'value' : 81 },
{ 'time' : '2016-03-28 12:17:00', 'value' : 90 },
etc....

Query above in solution will emit:

{
    "time" : "2016-03-28 12:14:00", 
    "result" : "enter"
}, 
{
    "time" : "2016-03-28 12:16:00", 
    "result" : "exit"
}
like image 42
Saleem Avatar answered Nov 06 '22 03:11

Saleem