I am a beginner in MongoDB and I just wonder what is the function of MongoDB's Finalize function/step in Map-Reduce. All that we do in that finalize() function can actually be done in reduce function. I just wonder what forces us to use finalize. I have done a research on this and found nothing. Thanks a lot for helping me
While I know this question was asked and answered 3 years ago, I had the same question and figured future googlers may find this additional info helpful: reduce()
may be called multiple times with the same key, with some of the values passed to it being what was returned by previous reduce()
calls. This can be because the collection is not sorted by the key in question, an incremental Map-Reduce, parallel execution, etc. This is why reduce()
should always return the same type of value that is passed to emit()
by map()
, for example.
So let's say your map
function just emitted a single number per document, and you uses your reduce
function to calculate the sum and the average for each key:
function reduce(key, values) {
var resultObj = {
sum: Array.sum(values)
};
resultObj.average = result.sum / values.length;
return resultObj;
}
In this scenario, your code will behave erroneously if it is passed an array that contains a resultObj
, as I'm not sure what happens when Array.sum()
is passed an object of numbers and objects. Even if that were not an issue, this code would ignore any previously calculated averages and return an incorrect result.
finalize()
, on the other hand, only gets called once, so it can return anything it wants, and (as the accepted answer mentions) it is run after all the data has been processed. So to do the above correctly, instead of emitting just a single number during the map phase you would emit something like { sum: myVal, count: 1 }
. Then your reduce
function would be:
function reduce(key, values) {
var resultObj = {
sum: 0,
count: 0
};
for (var i in values) {
resultObj.sum = resultObj.sum + values[i].sum;
resultObj.count = resultObj.count + values[i].count;
}
return resultObj;
}
...and then finally you could calculate the average in finalize
:
function finalize(key, reducedValue) {
return {
sum: reducedValue.sum,
average: reducedValue.sum / reducedValue.count
};
}
One of the biggest reasons is that finalise is run AFTER everything is completed on the final set of data. Not only that but finalise can also run on single results whereas reduce will skip single results.
If you can do everything in reduce then use reduce, you have no need for a finalise.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With