Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MongoDB MapReduce, return only when count > 1

I have data in MongoDB. The structure of one object is like this:

{
    "_id" : ObjectId("5395177980a6b1ccf916312c"),
    "institutionId" : "831",
    "currentObject" : {
          "systemIdentifiers" : [
            {
                "value" : "24387",
                "system" : "ABC"
            }]
      }
}

I have to know how many objects have same institutionId and systemIdentifiers[0].value and want to return only those duplicated in that way. In order to do that I group them up by these IDs and count occurrences.

The object (a pair of IDs) should be returned when count is greater than 1.

This is a chunk of code which does grouping with using MapReduce.

var map = function() {
    var key = this.institutionId;
    var val = this.currentObject.systemIdentifiers[0].value;
    emit({"institutionId":key,"workId":val}, {count:1});     
};
var reduce = function(key, values) {
    var count = 0;
    values.forEach(function(v) {
        count += v['count'];
    });
    return {count: count};
}
db.name.mapReduce(map, reduce, {out: "grouped"})
db.grouped.find()

To get only those having count greather than 1, I do

db.grouped.aggregate([{$match:{"value.count":{$gt: 1}}}])

An example result is then following

{
    "_id" : {
        "institutionId" : "1004",
        "workId" : "591426"
    },
    "value" : {
        "count" : 2
    }
}

But I am curious whether if possible to have it done just by doing MapReduce as one statement. Sth like adding a finalizer or so.

like image 897
Szymon Roziewski Avatar asked Feb 21 '26 00:02

Szymon Roziewski


1 Answers

If there is a single document havig a key it will never go inside reduce, is considered reduced already, that is the behaviour of the MongoDB map-reduce:

MongoDB will not call the reduce function for a key that has only a single value.

Using finalzie also doesn't help much, i.e. if in finalize funtion you do a if count > 1 then return reducedVal else None, than you will have None (instead of 1) in the result.

I am afraid that using (one) map-reduce, documents having count 1 will be alwasy in the result, since they are fired up from map.

You can use 2 map reduce operations in a chain, in the second map you don't emit the documents having count < 2. But these does not think it's better than an extra query as it is in your example.

like image 148
sergiuz Avatar answered Feb 23 '26 13:02

sergiuz



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!