I'm trying to do a simple map reduce in the Mongo shell, but the reduce function never gets called. This is my code :
db.sellers.mapReduce(
function(){ emit( this._id, 'Map') } ,
function(k,vs){ return 'Reduce' },
{ out: { inline: 1}})
And the result is
{
"results" : [
{
"_id" : ObjectId("4da0bdb56bd728c276911e1a"),
"value" : "Map"
},
{
"_id" : ObjectId("4da0df9a6bd728c276911e1b"),
"value" : "Map"
}
],
"timeMillis" : 0,
"counts" : {
"input" : 2,
"emit" : 2,
"output" : 2
},
"ok" : 1,
}
Whats wrong?
I'm using MongoDB 1.8.1 32 bit on Ubuntu 10.10
The purpose of reduce
is to, ekhem, reduce the set of values associated with a given key into a one value (aggregate results). If you emit only one value for each MapReduce key, there is not need for reduce, all the work is done. But if you emit two pairs for a given _id
, reduce will be called:
emit(this._id, 'Map1');
emit(this._id, 'Map2');
this will call reduce with the following parameters:
reduce(_id, ['Map1', 'Map2'])
More likely you will want to use _id
for MapReduce key when filtering dataset: emit
only when given record fulfills some condition. But again, reduce
won't be called in this case, which is expected.
Well, the MongoDB does not call Reduce function on a key if there is only one value for it.
In my opinion, this is bad. It should be left to my reducer code to decide whether to skip a singular value or do some operation on it.
Now, if I have to do some operation on singular value, I end up writing the finalize function and in the finalize, I try to differentiate which value has gone through the reducer or which not.
I am very sure, it does not happen this way in case of Hadoop.
Map reduce will collect values with a common key into a single value.
In this case nothing is to be done because each value emitted by map has a different key. No reduction is needed.
db.sellers.mapReduce(
function(){ emit( this._id, 'Map') } ,
function(k,vs){ return 'Reduce' },
{ out: { inline: 1}})
This is not entirely clear from reading the documentation.
If you wanted to call reduce, you might hardcode an ID like this:
db.sellers.mapReduce(
function(){ emit( 1, 'Map') } ,
function(k,vs){ return 'Reduce' },
{ out: { inline: 1}})
Now all the values emitted by map will be reduced until only one remains.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With