Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MapReduce results seem limited to 100?

I'm playing around with Map Reduce in MongoDB and python and I've run into a strange limitation. I'm just trying to count the number of "book" records. It works when there are less than 100 records but when it goes over 100 records the count resets for some reason.

Here is my MR code and some sample outputs:

var M = function () {
book = this.book;
emit(book, {count : 1});
}

var R = function (key, values) {
var sum = 0;
values.forEach(function(x) {
sum += 1;
});
var result = {
count : sum 
};
return result;
}

MR output when record count is 99:

{u'_id': u'superiors', u'value': {u'count': 99}}

MR output when record count is 101:

{u'_id': u'superiors', u'value': {u'count': 2.0}}

Any ideas?

like image 370
user1813867 Avatar asked Nov 10 '12 02:11

user1813867


2 Answers

Your reduce function should be summing up the count values, not just adding 1 for each value. Otherwise the output of a reduce can't properly be used as input back into another reduce. Try this instead:

var R = function (key, values) {
  var sum = 0;
  values.forEach(function(x) {
    sum += x.count;
  });
  var result = {
    count : sum 
  };
  return result;
}
like image 152
JohnnyHK Avatar answered Nov 19 '22 13:11

JohnnyHK


If emits numbers are equal or more than 100, 100 emits will be sent to reduce function first and process:

{count: 100}

Then only 1 emit remains, sent to reduce function and process:

{count: 1}

OK, the result now is:

[{count: 100}, {count: 1}]

And then this will call reduce function again (very important!). Because foreach sum+=1 in your code. There are two elements in the array, so the result is 2.

ref: http://www.mongodb.org/display/DOCS/MapReduce#MapReduce-Amoretechnicalexplanation

like image 2
Chien-Wei Huang Avatar answered Nov 19 '22 15:11

Chien-Wei Huang