I've written a MapReduce in MongoDB and would like to use a global variable as a cache to write to/read from. I know it is not possible to have global variables across map function instances - I just want a global variable within each function instance. This type of functionality exists in Hadoop's MapReduce so I was expecting it to be there in MongoDB. But the following does not seem to work:
var cache = {}; // Does not seem to work!
function () {
var hashValue = this.varValue1 + this.varValue2;
if(typeof(cache[hashValue])!= 'undefined') {
// Do nothing, we've processed at least one input record with this hash
} else {
// Process the input record
// Cache the record
cache[hashValue] = '1';
}
}
Is this not allowed in MongoDB's MapReduce implementation, or am I doing something wrong in JavaScript (not experienced in JS)?
Looking at the docs, I'm finding the following:
db.runCommand(
{ mapreduce : <collection>,
map : <mapfunction>,
reduce : <reducefunction>
[, scope : <object where fields go into javascript global scope >]
}
);
I think that "scope" variable is what you need.
There's a test / example on Github that uses the "scope" variable.
I'm still new to this stuff, but hopefully that's enough to get you started.
As Gates VP said, you need to add cache into global scope. So, to provide complete answer, considering your script, this is what you'll need to do:
db.runCommand(
{ mapreduce : <your collection>,
map : <your map function, or reference to it>,
reduce : <your reduce function, or reference to it>,
scope : { cache : {} }
}
);
The command will inject contents of the 'scope' object parameter into your global context. The caching then will work per how you are using it in your map function. I've tested this.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With