Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MongoDB MapReduce - Emit one key/one value doesnt call reduce

So i'm new with mongodb and mapreduce in general and came across this "quirk" (or atleast in my mind a quirk)

Say I have objects in my collection like so:

{'key':5, 'value':5}

{'key':5, 'value':4}

{'key':5, 'value':1}

{'key':4, 'value':6}

{'key':4, 'value':4}

{'key':3, 'value':0}

My map function simply emits the key and the value

My reduce function simply adds the values AND before returning them adds 1 (I did this to check to see if the reduce function is even called)

My results follow:

{'_id': 3, 'value': 0}

{'_id':4, 'value': 11.0}

{'_id':5, 'value': 11.0}

As you can see, for the keys 4 & 5 I get the expected answer of 11 BUT for the key 3 (with only one entry in the collection with that key) I get the unexpected 0!

Is this natural behavior of mapreduce in general? For MongoDB? For pymongo (which I am using)?

like image 311
IamAlexAlright Avatar asked Jun 13 '12 19:06

IamAlexAlright


2 Answers

The reduce function combines documents with the same key into one document. If the map function emits a single document for a particular key (as is the case with key 3), the reduce function will not be called.

like image 88
Jenna Avatar answered Oct 16 '22 06:10

Jenna


I realize this is an older question, but I came to it and felt like I still didn't understand why this behavior exists and how to build map/reduce functionality so it's a non-issue.

The reason MongoDB doesn't call the reduce function if there is a single instance of a key is because it isn't necessary (I hope this will make more sense in a moment). The following are requirements for reduce functions:

  • The reduce function must return an object whose type must be identical to the type of the value emitted by the map function.
  • The order of the elements in the valuesArray should not affect the output of the reduce function
  • The reduce function must be idempotent.

The first requirement is very important and it seems a number of people are overlooking it because I've seen a number of people mapping in the reduce function then dealing with the single-key case in the finalize function. This is the wrong way to address the issue, however.

Think about it like this: If there's only a single instance of a key, a simple optimization is to skip the reducer entirely (there's nothing to reduce). Single-key values are still included in the output, but the intent of the reducer is to build an aggregate result of the multi-key documents in your collection. If the mapper and reducer are outputting the same type, you should be blissfully unaware by looking at the object structure of the output from your map/reduce functions. You shouldn't have to use a finalize function to correct the structure of your objects that didn't run through the reducer.

In short, do your mapping in your map function and reduce multi-key values into a single, aggregate result in your reduce functions.

like image 26
senfo Avatar answered Oct 16 '22 08:10

senfo