MongoDB mapreduce missing data with 'null' in return

Tags:

mapreduce

So this is strange. I'm trying to use mapreduce to group datetime/metrics under a unique port:

Document layout:

{
        "_id" : ObjectId("5069d68700a2934015000000"),
        "port_name" : "CL1-A",
        "metric" : "340.0",
        "port_number" : "0",
        "datetime" : ISODate("2012-09-30T13:44:00Z"),
        "array_serial" : "12345"
}

and mapreduce functions:

var query = {
        'array_serial' : array,
        'port_name' : { $in : ports },
        'datetime' : { $gte : from, $lte : to}

    }

    var map = function() {
        emit( { portname : this.port_name } , { datetime : this.datetime,
                                metric : this.metric });
    }

    var reduce = function(key, values) {
        var res = { dates : [], metrics : [], count : 0}

        values.forEach(function(value){
            res.dates.push(value.datetime);
            res.metrics.push(value.metric);
            res.count++;
        })

        return res;
    }

    var command = {
        mapreduce : collection,
        map : map.toString(),
        reduce : reduce.toString(),
        query : query,
        out : { inline : 1 }

    }

    mongoose.connection.db.executeDbCommand(command, function(err, dbres){
        if(err) throw err;
        console.log(dbres.documents);
        res.json(dbres.documents[0].results);
    })

If a small number of records is requested, say 5 or 10, or even 60 I get all the data back I'm expecting. Larger queries return truncated values....

I just did some more testing and it seems like it's limiting the record output to 100? This is minutely data and when I run a query for a 24 hour period I would expect 1440 records back... I just ran it a received 80. :\

Is this expected? I'm not specifying a limit anywhere I can tell...

More data:

Query for records from 2012-10-01T23:00 - 2012-10-02T00:39 (100 minutes) returns correctly:

[
  {
    "_id": {
      "portname": "CL1-A"
    },
    "value": {
      "dates": [
        "2012-10-01T23:00:00.000Z",
        "2012-10-01T23:01:00.000Z",
        "2012-10-01T23:02:00.000Z",
         ...cut...
        "2012-10-02T00:37:00.000Z",
        "2012-10-02T00:38:00.000Z",
        "2012-10-02T00:39:00.000Z"
      ],
      "metrics": [
        "1596.0",
        "1562.0",
        "1445.0",
        ...cut...
        "774.0",
        "493.0",
        "342.0"
      ],
      "count": 100
    }
  }
]

...add one more minute to the query 2012-10-01T23:00 - 2012-10-02T00:39 (101 minutes) :

[
  {
    "_id": {
      "portname": "CL1-A"
    },
    "value": {
      "dates": [
        null,
        "2012-10-02T00:40:00.000Z"
      ],
      "metrics": [
        null,
        "487.0"
      ],
      "count": 2
    }
  }
]

the dbres.documents object shows the correct expected emitted records:

[ { results: [ [Object] ],
    timeMillis: 8,
    counts: { input: 101, emit: 101, reduce: 2, output: 1 },
    ok: 1 } ]

...so is the data getting lost somewhere?

401

asked Oct 06 '12 03:10

Chris Matta

2 Answers

Rule number one of MapReduce:

Thou shall return from Reduce the exact same format that you emit with your key in Map.

Rule number two of MapReduce:

Thou shall reduce the array of values passed to reduce as many times as necessary. Reduce function may be called many times.

You've broken both of those rules in your implementation of reduce.

Your Map function is emitting key, value pairs.

key: port name (you should simply emit the name as the key, not a document)
value: a document representing three things you need to accumulate (date, metric, count)

Try this instead:

map = function() {  // if you want to reduce to an array you have to emit arrays
    emit ( this.port_name, { dates : [this.datetime], metrics : [this.metric], count: 1 });
}

reduce = function(key, values) {        // for each key you get an array of values
   var res = { dates: [], metrics: [], count: 0 };  // you must reduce them to one

   values.forEach(function(value) {
            res.dates = value.dates.concat(res.dates);
            res.metrics = value.metrics.concat(res.metrics);
            res.count += value.count;   // VERY IMPORTANT reduce result may be re-reduced
        }) 

        return res;
    }

184

answered Oct 05 '22 09:10

Asya Kamsky

Try to output the map reduce data in a temp collection instead of in memory. May that is the reason. From Mongo Docs:

{ inline : 1} - With this option, no collection will be created, and the whole map-reduce operation will happen in RAM. Also, the results of the map-reduce will be returned within the result object. Note that this option is possible only when the result set fits within the 16MB limit of a single document. In v2.0, this is your only available option on a replica set secondary.

Also, It may not be the reason but MongoDB has data size limitations (2GB) on a 32bit machine.

answered Oct 05 '22 10:10

vikas

Related questions
                            
                                mgo - query performance seems consistently slow (500-650ms)
                            
                                How to query tree structure recursively with MongoDB?
                            
                                Spring multipart file upload with gridfs size limit exception
                            
                                C# MongoDB complex class serialization
                            
                                How safe are angular route guards? [duplicate]
                            
                                mongodump gives segmentation fault
                            
                                Node.js Mongoose .update with ArrayFilters
                            
                                Node.js can not set default UUID with mongoose
                            
                                How can I find a document by GUID _id?
                            
                                What are all the mongoose events and where are they documented?
                            
                                Typeorm: provide default value for boolean with Mongo database
                            
                                ObjectID automatically set to "0...0" in go with official mongoDB driver
                            
                                Invalid schema configuration: `model` is not a valid type within the array `characters`
                            
                                Docker Compose MongoDB docker-entrypoint-initdb.d is not working
                            
                                Mongo aggregation Match multiple values
                            
                                Combine MongoDB and Postgresql in Rails?
                            
                                How do I write a logging middleware for pyramid/pylons 2?
                            
                                What is the proper way to manage connections to Mongo with MongoJS?
                            
                                PHP MongoDB $exist not working
                            
                                Mongoose limiting query to 1000 results when I want more/all (migrating from 2.6.5 to 3.1.2)

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With