Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find count of maximum consecutive records based on one field in Mongodb Query

I want to find the count of maximum consecutive records based on one particular field.

My db.people collection after finding sort based on field is:

> db.people.find().sort({ updated_at: 1})
{ "_id" : 1, "name" : "aaa", "flag" : true, "updated_at" : ISODate("2014-02-07T08:42:48.688Z") }
{ "_id" : 2, "name" : "bbb", "flag" : false, "updated_at" : ISODate("2014-02-07T08:43:10Z") }
{ "_id" : 3, "name" : "ccc", "flag" : true, "updated_at" : ISODate("2014-02-07T08:43:40.660Z") }
{ "_id" : 4, "name" : "ddd", "flag" : true, "updated_at" : ISODate("2014-02-07T08:43:51.567Z") }
{ "_id" : 6, "name" : "fff", "flag" : false, "updated_at" : ISODate("2014-02-07T08:44:23.713Z") }
{ "_id" : 7, "name" : "ggg", "flag" : true, "updated_at" : ISODate("2014-02-07T08:44:44.639Z") }
{ "_id" : 8, "name" : "hhh", "flag" : true, "updated_at" : ISODate("2014-02-07T08:44:51.415Z") }
{ "_id" : 5, "name" : "eee", "flag" : true, "updated_at" : ISODate("2014-02-07T08:55:24.917Z") }

In above records, there are two places where flag attribute value comes true in consecutive ways. i.e

record with _id 3 - record with _id 4   (2 consecutive records)

and

record with _id 7 - record with _id 8 - record with _id 5  (3 consecutive records)

However, I want the maximum consecutive number from mongo query search. i.e 3.

Is it possible to get such result?

I googled it and found a little similar solution of using Map-Reduce here https://stackoverflow.com/a/7408639/1120530.

I am new to mongodb and couldn't able to understand the map-reduce documentation and specially how to apply it in above scenario.

like image 478
brg Avatar asked Feb 07 '14 07:02

brg


1 Answers

You can do this mapReduce operation.

First the mapper:

var mapper = function () {


    if ( this.flag == true ) {
        totalCount++;
    } else {
        totalCount = 0;
    }

    if ( totalCount != 0 ) {
        emit (
        counter,
        {  _id: this._id, totalCount: totalCount }
    );
    } else {
      counter++;
    }

};

Which keeps a running count of the total times that the true value is seen in flag. If that count is more than 1 then we emit the the value, also containing the document _id. Another counter which is used for the key is incremented when the flag is false, in order to have a grouping "key" for the matches.

Then the reducer:

var reducer = function ( key, values ) {

    var result = { docs: [] };

    values.forEach(function(value) {
        result.docs.push(value._id);
        result.totalCount = value.totalCount;
    });

    return result;

};

Simply pushes the _id values onto a result array along with the totalCount.

Then run:

db.people.mapReduce(
    mapper,
    reducer,
   { 
       "out": { "inline": 1 }, 
       "scope": { 
           "totalCount": 0, 
           "counter": 0 
       }, 
       "sort": { "updated_at": 1 } 
   }
)

So with the mapper and reducer functions, we then define the global variables used in "scope" and pass in the "sort" that was required on updated_at dates. Which gives the result:

{
    "results" : [
        {
            "_id" : 1,
            "value" : {
                "docs" : [
                     3,
                     4
                 ],
                 "totalCount" : 2
            }
        },
        {
            "_id" : 2,
            "value" : {
            "docs" : [
                7,
                8,
                5
             ],
             "totalCount" : 3
             }
        }
    ],
    "timeMillis" : 2,
    "counts" : {
            "input" : 7,
            "emit" : 5,
            "reduce" : 2,
            "output" : 2
    },
    "ok" : 1,
}

Of course you could just skip the totalCount variable and just use the array length, which would be the same. But since you want to use that counter anyway it's just added in. But that's the principle.

So yes, this was a problem suited to mapReduce, and now you have an example.

like image 165
Neil Lunn Avatar answered Oct 02 '22 14:10

Neil Lunn