I have series of documents in MongoDB collection that looks like this:
{ 'time' : '2016-03-28 12:12:00', 'value' : 90 },
{ 'time' : '2016-03-28 12:13:00', 'value' : 82 },
{ 'time' : '2016-03-28 12:14:00', 'value' : 75 },
{ 'time' : '2016-03-28 12:15:00', 'value' : 72 },
{ 'time' : '2016-03-28 12:16:00', 'value' : 81 },
{ 'time' : '2016-03-28 12:17:00', 'value' : 90 },
etc....
The tasks is - with trash hold value of 80 find all times where value is entering
below 80 and exiting
above 80
{ 'time' : '2016-03-28 12:14:00', 'result' : 'enter' },
{ 'time' : '2016-03-28 12:16:00', 'result' : 'exit' },
Is it way to have map reduce or aggregation query that would produce such result ? I was trying to loop thru sorted results, but it is very processing and memory expensive - I need to do series of such checks.
PS. I am using Django and mongoengine to execute call.
MapReduce of MongoDB is based on JavaScript using the SpiderMonkey engine and the queries are executed in a single thread. On the other hand, Aggregation Pipeline queries run on compiled C++ code which makes them faster as it is not interpreted like JavaScript.
Map-reduce operations can be rewritten using aggregation pipeline operators, such as $group , $merge , and others. For map-reduce operations that require custom functionality, MongoDB provides the $accumulator and $function aggregation operators starting in version 4.4.
As per the MongoDB documentation, Map-reduce is a data processing paradigm for condensing large volumes of data into useful aggregated results. MongoDB uses mapReduce command for map-reduce operations.
You can use aggregation operations to: Group values from multiple documents together. Perform operations on the grouped data to return a single result. Analyze data changes over time.
I'm not sure this is possible with the MongoDB aggregation framework alone since, as mentioned by @BlakesSeven, there is no link/connection between the subsequent documents. And you need this connection to check if the new value went below or above the desired threshold comparing to what the value was right before it, in a previous document.
Here is a naive pure-python (since it is tagged with Django and MongoEngine) solution that loops over the sorted results maintaining the threshold-track variable and catching when it goes lower or higher 80 (col
is your collection reference):
THRESHOLD = 80
cursor = col.find().sort("time")
first_value = next(cursor)
more_than = first_value["value"] >= THRESHOLD
for document in cursor:
if document["value"] < THRESHOLD:
if more_than:
print({"time": document["time"], "result": "enter"})
more_than = False
else:
if not more_than:
print({"time": document["time"], "result": "exit"})
more_than = True
For the provided sample data, it prints:
{'time': '2016-03-28 12:14:00', 'result': 'enter'}
{'time': '2016-03-28 12:16:00', 'result': 'exit'}
As a side note and an alternative solution..if you have control over the how these records are inserted, when you insert a document into this collection, you may check what is the latest value
, compare it to the threshold and set the result
as a separate field. Then, querying the entering and exiting the threshold points would become as easy as:
col.find({"result" : {$exists : true}})
You can name this approach as "marking the threshold values beforehand". This probably makes sense only from querying/searching performance perspective and if you are going to do this often.
You can achieve transformation of documents easily with help of aggregation framework and cursor iteration.
Example:
db.collection.aggregate([
{$project:
{
value:1,
"threshold":{$let:
{
vars: {threshold: 80 },
in: "$$threshold"
}}
}
},
{$match:{value:{$ne: "$threshold"}}},
{$group:
{
_id:"$null",
low:{
$max:{
$cond:[{$lt:["$value","$threshold"]},"$value",-1]
}
},
high:{
$min:{
// 10000000000 is a superficial value.
// need something greater than values in documents
$cond:[{$gt:["$value","$threshold"]},"$value",10000000000]
}
},
threshold:{$first:"$threshold"}
}
}
])
Aggregation framework will return a document with two values.
{
"_id" : null,
"low" : NumberInt(75),
"high" : NumberInt(81),
"threshold" : NumberInt(80)
}
We can easily find documents matching return criteria. e.g. in NodeJS we can easily do this. assuming variable result
holds result from aggregation query.
result.forEach(function(r){
var documents = [];
db.collection.find({$or:[{"value": r.low},{"value": r.high}]}).forEach(function(doc){
var _doc = {};
_doc.time = doc.time;
_doc.result = doc.value < r.threshold ? "enter" : "exit";
documents.push(_doc);
});
printjson(documents);
});
As you mention, if your input documents are (sample)
{ 'time' : '2016-03-28 12:12:00', 'value' : 90 },
{ 'time' : '2016-03-28 12:13:00', 'value' : 82 },
{ 'time' : '2016-03-28 12:14:00', 'value' : 75 },
{ 'time' : '2016-03-28 12:15:00', 'value' : 72 },
{ 'time' : '2016-03-28 12:16:00', 'value' : 81 },
{ 'time' : '2016-03-28 12:17:00', 'value' : 90 },
etc....
Query above in solution will emit:
{
"time" : "2016-03-28 12:14:00",
"result" : "enter"
},
{
"time" : "2016-03-28 12:16:00",
"result" : "exit"
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With