Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Finalize Step in MongoDB Map-Reduce

I am a beginner in MongoDB and I just wonder what is the function of MongoDB's Finalize function/step in Map-Reduce. All that we do in that finalize() function can actually be done in reduce function. I just wonder what forces us to use finalize. I have done a research on this and found nothing. Thanks a lot for helping me

like image 483
Yudho Ahmad Diponegoro Avatar asked Dec 03 '22 19:12

Yudho Ahmad Diponegoro


2 Answers

While I know this question was asked and answered 3 years ago, I had the same question and figured future googlers may find this additional info helpful: reduce() may be called multiple times with the same key, with some of the values passed to it being what was returned by previous reduce() calls. This can be because the collection is not sorted by the key in question, an incremental Map-Reduce, parallel execution, etc. This is why reduce() should always return the same type of value that is passed to emit() by map(), for example.

So let's say your map function just emitted a single number per document, and you uses your reduce function to calculate the sum and the average for each key:

function reduce(key, values) {
    var resultObj = {
      sum: Array.sum(values)
    };

    resultObj.average = result.sum / values.length;
    return resultObj;
}

In this scenario, your code will behave erroneously if it is passed an array that contains a resultObj, as I'm not sure what happens when Array.sum() is passed an object of numbers and objects. Even if that were not an issue, this code would ignore any previously calculated averages and return an incorrect result.

finalize(), on the other hand, only gets called once, so it can return anything it wants, and (as the accepted answer mentions) it is run after all the data has been processed. So to do the above correctly, instead of emitting just a single number during the map phase you would emit something like { sum: myVal, count: 1 }. Then your reduce function would be:

function reduce(key, values) {
    var resultObj = {
      sum: 0,
      count: 0
    };

    for (var i in values) {
       resultObj.sum = resultObj.sum + values[i].sum;
       resultObj.count = resultObj.count + values[i].count;
    }

    return resultObj;
}

...and then finally you could calculate the average in finalize:

function finalize(key, reducedValue) {
   return {
     sum: reducedValue.sum,
     average: reducedValue.sum / reducedValue.count
   };
}
like image 151
David Deutsch Avatar answered Dec 29 '22 01:12

David Deutsch


One of the biggest reasons is that finalise is run AFTER everything is completed on the final set of data. Not only that but finalise can also run on single results whereas reduce will skip single results.

If you can do everything in reduce then use reduce, you have no need for a finalise.

like image 31
Sammaye Avatar answered Dec 29 '22 00:12

Sammaye