I'm playing around with Map Reduce in MongoDB and python and I've run into a strange limitation. I'm just trying to count the number of "book" records. It works when there are less than 100 records but when it goes over 100 records the count resets for some reason.
Here is my MR code and some sample outputs:
var M = function () {
book = this.book;
emit(book, {count : 1});
}
var R = function (key, values) {
var sum = 0;
values.forEach(function(x) {
sum += 1;
});
var result = {
count : sum
};
return result;
}
MR output when record count is 99:
{u'_id': u'superiors', u'value': {u'count': 99}}
MR output when record count is 101:
{u'_id': u'superiors', u'value': {u'count': 2.0}}
Any ideas?
Your reduce
function should be summing up the count
values, not just adding 1
for each value. Otherwise the output of a reduce
can't properly be used as input back into another reduce
. Try this instead:
var R = function (key, values) {
var sum = 0;
values.forEach(function(x) {
sum += x.count;
});
var result = {
count : sum
};
return result;
}
If emits numbers are equal or more than 100, 100 emits will be sent to reduce function first and process:
{count: 100}
Then only 1 emit remains, sent to reduce function and process:
{count: 1}
OK, the result now is:
[{count: 100}, {count: 1}]
And then this will call reduce function again (very important!). Because foreach sum+=1
in your code. There are two elements in the array, so the result is 2
.
ref: http://www.mongodb.org/display/DOCS/MapReduce#MapReduce-Amoretechnicalexplanation
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With