In mongodb, I have a map function as below:
var map = function() {
emit( this.username, {count: 1, otherdata:otherdata} );
}
and reduce function as below:
var reduce = function(key, values) {
values.forEach(function(value){
total += value.count; //note this line
}
return {count: total, otherdata: values[0].otherdata}; //please ignore otherdata
}
The problem is with the line noted:
total += value.count;
In my dataset, reduce function is called 9 times, and the supposed map reduced result count should be 8908.
With the line above, the returned result would be correctly returned as 8908.
But if I changed the line to:
total += 1;
The returned result would be only 909, about 1/9 of the supposed result.
Also that I tried print(value.count) and the printed result is 1.
What explains this behavior?
short answer : value.count is not always equal to one.
long answer : This is the expected behavior of map reduce : the reduce function is aggreagating the results of the map function. However, it does aggregate on the results of map function by small groups producing intermediate results (sub total in your case). Then reduce functions are runned again on these intermediate results as they were direct results of the map function. And so on until there is only one intermediate result left for each key, that's the final results.
It can be seen as a pyramid of intermediate results :
emit(...)-|
|- reduce -> |
emit(...)-| |
| |- reduce ->|
emit(...)-| | |
| | |
emit(...)-|- reduce -> | |
| |-> reduce = final result
emit(...)-| |
|
emit(...)--- reduce ------------ >|
|
emit(...)-----------------reduce ->|
The number of reduce and their inputs is unpredicatable and is meant to remain hidden. That's why you have to give a reduce function which return data of the same type (same schema) as input.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With