Reduce is called several times with the same key in mongodb map-reduce

Tags:

I'm trying to run map reduce on mongodb in mongo shell. For some reason, in the reduce phase, I get several calls for the same key (instead of single one), so I get wrong results. I'm not an expert in this domains, so maybe I'm doing some stupid mistake. Any help appreciated.

Thanks.

This is my small example:

I'm creating 10000 documents:

var i = 0;
db.docs.drop();
while (i < 10000) {
    db.docs.insert({text:"line " + i,index:i});
    i++;
}

Then I'm doing map-reduce based on module 10 (so I except to get 1000 in each "bucket")

db.docs.mapReduce(
    function() { 
       emit(this.index%10,1);
    },
    function(key,values) {
       return values.length;
    },
    {
    out : {inline : 1}
    }
);

However, as results I get the following:

{
    "results" : [
        {
            "_id" : 0,
            "value" : 21
        },
        {
            "_id" : 1,
            "value" : 21
        },
        {
            "_id" : 2,
            "value" : 21
        },
        {
            "_id" : 3,
            "value" : 21
        },
        {
            "_id" : 4,
            "value" : 21
        },
        {
            "_id" : 5,
            "value" : 21
        },
        {
            "_id" : 6,
            "value" : 21
        },
        {
            "_id" : 7,
            "value" : 21
        },
        {
            "_id" : 8,
            "value" : 21
        },
        {
            "_id" : 9,
            "value" : 21
        }
    ],
    "timeMillis" : 76,
    "counts" : {
        "input" : 10000,
        "emit" : 10000,
        "reduce" : 500,
        "output" : 10
    },
    "ok" : 1,
}

957

asked Oct 08 '13 16:10

eran

1 Answers

Map/Reduce is essentially a recursive operation. In particular, the documented requirements for the reduce function include the following statement:

MongoDB can invoke the reduce function more than once for the same key. In this case, the previous output from the reduce function for that key will become one of the input values to the next reduce function invocation for that key.

Therefore, you have to expect that the input is merely the number that was counted by a previous invocation. The following code does that by actually adding the values:

db.docs.mapReduce(
    function() { emit(this.index % 10, 1); }, 
    function(key,values) { return Array.sum(values); }, 
    { out : {inline : 1} } );

Now, the emit(key, 1) makes more sense in a way, because 1 is no longer just any number used to fill the array, but its value is considered.

As a sidenote, note how dangerous this is: For a smaller dataset, the correct result might have been given by accident, because the engine decided a parallelization wouldn't be necessary.

133

answered Nov 15 '22 03:11

mnemosyn

Related questions
                            
                                MongoDB insert field to existing document if does not already exist
                            
                                MongoDB Morphia - Unique
                            
                                Why don't mongodb session MongoStore's go away after session ends?
                            
                                Putting a Date object into MongoDB, getting back a float when querying with pymongo
                            
                                Zend Framework 2 + Doctrine ODM, "The Class was not found in the chain configured namespaces" error?
                            
                                Is Wikipedia's explanation of Map Reduce's reduce incorrect?
                            
                                Is it possible to get informed about mongodb collection changes?
                            
                                Validating uniqueness of an embedded document scoped by its parent in mongoose
                            
                                Mongodb dynamic schemas with spring data mongodb
                            
                                Doctrine MongoDb Embedded Document - How to specify fields
                            
                                Return updated document with Monger
                            
                                mongodb get elements which was inserted after some document
                            
                                MongoDb Query - Count + where condition
                            
                                Is it possible to use SpringData MongoDB repository to execute an arbitrary query, with pagination?
                            
                                Over-use of require() in node.js, mongoose
                            
                                remove _id from mongodb result java
                            
                                Is there a web server running in mongo?
                            
                                Do you need to run RAID 10 on Mongo when using Provisioned IOPS on Amazon EBS?
                            
                                How to use query.explain() of mongodb from mongoengine
                            
                                how to clear warnings in node js while using mongoose

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Reduce is called several times with the same key in mongodb map-reduce

Tags:

mongodb

mapreduce

eran

People also ask

1 Answers

mnemosyn

Recent Activity

Donate For Us