I have series of documents in MongoDB collection that looks like this: <pre class="prettyprint"><code>{ 'time' : '2016-03-28 12:12:00', 'value' : 90 }, { 'time' : '2016-03-28 12:13:00', 'value' : 82 }, { 'time' : '2016-03-28 12:14:00', 'value' : 75 }, { 'time' : '2016-03-28 12:15:00', 'value' : 72 }, { 'time' : '2016-03-28 12:16:00', 'value' : 81 }, { 'time' : '2016-03-28 12:17:00', 'value' : 90 }, etc.... </code></pre> The tasks is - with trash hold value of 80 find all times where value is <code>entering</code> below 80 and <code>exiting</code> above 80 <pre class="prettyprint"><code>{ 'time' : '2016-03-28 12:14:00', 'result' : 'enter' }, { 'time' : '2016-03-28 12:16:00', 'result' : 'exit' }, </code></pre> Is it way to have map reduce or aggregation query that would produce such result ? I was trying to loop thru sorted results, but it is very processing and memory expensive - I need to do series of such checks. PS. I am using Django and mongoengine to execute call.

You can achieve transformation of documents easily with help of aggregation framework and cursor iteration. Example: <pre class="prettyprint"><code>db.collection.aggregate([ {$project: { value:1, "threshold":{$let: { vars: {threshold: 80 }, in: "$$threshold" }} } }, {$match:{value:{$ne: "$threshold"}}}, {$group: { _id:"$null", low:{ $max:{ $cond:[{$lt:["$value","$threshold"]},"$value",-1] } }, high:{ $min:{ // 10000000000 is a superficial value. // need something greater than values in documents $cond:[{$gt:["$value","$threshold"]},"$value",10000000000] } }, threshold:{$first:"$threshold"} } } ]) </code></pre> Aggregation framework will return a document with two values. <pre class="prettyprint"><code>{ "_id" : null, "low" : NumberInt(75), "high" : NumberInt(81), "threshold" : NumberInt(80) } </code></pre> We can easily find documents matching return criteria. e.g. in NodeJS we can easily do this. assuming variable <code>result</code> holds result from aggregation query. <pre class="prettyprint"><code>result.forEach(function(r){ var documents = []; db.collection.find({$or:[{"value": r.low},{"value": r.high}]}).forEach(function(doc){ var _doc = {}; _doc.time = doc.time; _doc.result = doc.value < r.threshold ? "enter" : "exit"; documents.push(_doc); }); printjson(documents); }); </code></pre> As you mention, if your input documents are (sample) <pre class="prettyprint"><code>{ 'time' : '2016-03-28 12:12:00', 'value' : 90 }, { 'time' : '2016-03-28 12:13:00', 'value' : 82 }, { 'time' : '2016-03-28 12:14:00', 'value' : 75 }, { 'time' : '2016-03-28 12:15:00', 'value' : 72 }, { 'time' : '2016-03-28 12:16:00', 'value' : 81 }, { 'time' : '2016-03-28 12:17:00', 'value' : 90 }, etc.... </code></pre> Query above in solution will emit: <pre class="prettyprint"><code>{ "time" : "2016-03-28 12:14:00", "result" : "enter" }, { "time" : "2016-03-28 12:16:00", "result" : "exit" } </code></pre>

Mongo DB - usage of map reduce or aggregation

Tags:

mongodb

django

mongoengine

I have series of documents in MongoDB collection that looks like this:

{ 'time' : '2016-03-28 12:12:00', 'value' : 90 },
{ 'time' : '2016-03-28 12:13:00', 'value' : 82 },
{ 'time' : '2016-03-28 12:14:00', 'value' : 75 },
{ 'time' : '2016-03-28 12:15:00', 'value' : 72 },
{ 'time' : '2016-03-28 12:16:00', 'value' : 81 },
{ 'time' : '2016-03-28 12:17:00', 'value' : 90 },
etc....

The tasks is - with trash hold value of 80 find all times where value is entering below 80 and exiting above 80

{ 'time' : '2016-03-28 12:14:00', 'result' : 'enter' },
{ 'time' : '2016-03-28 12:16:00', 'result' : 'exit' },

Is it way to have map reduce or aggregation query that would produce such result ? I was trying to loop thru sorted results, but it is very processing and memory expensive - I need to do series of such checks.

PS. I am using Django and mongoengine to execute call.

346

asked Mar 26 '16 03:03

bensiu

2 Answers

I'm not sure this is possible with the MongoDB aggregation framework alone since, as mentioned by @BlakesSeven, there is no link/connection between the subsequent documents. And you need this connection to check if the new value went below or above the desired threshold comparing to what the value was right before it, in a previous document.

Here is a naive pure-python (since it is tagged with Django and MongoEngine) solution that loops over the sorted results maintaining the threshold-track variable and catching when it goes lower or higher 80 (col is your collection reference):

THRESHOLD = 80
cursor = col.find().sort("time")

first_value = next(cursor)
more_than = first_value["value"] >= THRESHOLD

for document in cursor:
    if document["value"] < THRESHOLD:
        if more_than:
            print({"time": document["time"], "result": "enter"})
        more_than = False
    else:
        if not more_than:
            print({"time": document["time"], "result": "exit"})
        more_than = True

For the provided sample data, it prints:

{'time': '2016-03-28 12:14:00', 'result': 'enter'}
{'time': '2016-03-28 12:16:00', 'result': 'exit'}

As a side note and an alternative solution..if you have control over the how these records are inserted, when you insert a document into this collection, you may check what is the latest value, compare it to the threshold and set the result as a separate field. Then, querying the entering and exiting the threshold points would become as easy as:

col.find({"result" : {$exists : true}})

You can name this approach as "marking the threshold values beforehand". This probably makes sense only from querying/searching performance perspective and if you are going to do this often.

195

answered Nov 06 '22 03:11

alecxe

You can achieve transformation of documents easily with help of aggregation framework and cursor iteration.

Example:

db.collection.aggregate([
  {$project:
    {
      value:1,
      "threshold":{$let:
        {
          vars: {threshold: 80 }, 
          in:   "$$threshold"
        }}
     }
  },
  {$match:{value:{$ne: "$threshold"}}},
  {$group:
     {
       _id:"$null", 
       low:{
         $max:{
             $cond:[{$lt:["$value","$threshold"]},"$value",-1]
          }
       },

       high:{
         $min:{
             // 10000000000 is a superficial value. 
             // need something greater than values in documents
             $cond:[{$gt:["$value","$threshold"]},"$value",10000000000] 
          }
       },

       threshold:{$first:"$threshold"}
     }
   }  
])

Aggregation framework will return a document with two values.

{ 
    "_id" : null, 
    "low" : NumberInt(75), 
    "high" : NumberInt(81), 
    "threshold" : NumberInt(80)
}

We can easily find documents matching return criteria. e.g. in NodeJS we can easily do this. assuming variable result holds result from aggregation query.

result.forEach(function(r){

   var documents = [];

   db.collection.find({$or:[{"value": r.low},{"value": r.high}]}).forEach(function(doc){

        var _doc = {};
        _doc.time = doc.time;
        _doc.result = doc.value < r.threshold ? "enter" : "exit";
        documents.push(_doc);
   });
   printjson(documents);
});

As you mention, if your input documents are (sample)

{ 'time' : '2016-03-28 12:12:00', 'value' : 90 },
{ 'time' : '2016-03-28 12:13:00', 'value' : 82 },
{ 'time' : '2016-03-28 12:14:00', 'value' : 75 },
{ 'time' : '2016-03-28 12:15:00', 'value' : 72 },
{ 'time' : '2016-03-28 12:16:00', 'value' : 81 },
{ 'time' : '2016-03-28 12:17:00', 'value' : 90 },
etc....

Query above in solution will emit:

{
    "time" : "2016-03-28 12:14:00", 
    "result" : "enter"
}, 
{
    "time" : "2016-03-28 12:16:00", 
    "result" : "exit"
}

answered Nov 06 '22 03:11

Saleem

Related questions
                            
                                django views if statement not working with a boolean
                            
                                multiple supervisor.conf for two different projects
                            
                                Daemonize Celerybeat in Elastic Beanstalk(AWS)
                            
                                docker-compose for a pure data container and web server, postgresql
                            
                                GenericRelatedObjectManager not JSON serializable
                            
                                Django rotates iphone image after upload
                            
                                django settings.py os.environ.get("X") not fetching correct values
                            
                                Duplicate queries reported by debug_toolbar
                            
                                Docker development workflow
                            
                                DjangoRestFramework - How to access other fields of a OneToOneField reverse relationship using a model serializer?
                            
                                Django admin add related object doesn't open popup window?
                            
                                Django REST Framework caching bug
                            
                                permission denied for relation django_migrations
                            
                                change python's default traceback behavior include more code from project path?
                            
                                django templates are not rendering
                            
                                Python Django: Join on the same table
                            
                                Django POST listener in Elastic Beanstalk to receive AWS Worker Tier requests
                            
                                Django/PostgreSQL varchar to UUID
                            
                                AWS Elastic BeansTalk Django cronjob post request returning 403 error
                            
                                Unable to use utf8mb4 character set with CloudSQL on AppEngine Python

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With