Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MongoDB mapreduce missing data with 'null' in return

So this is strange. I'm trying to use mapreduce to group datetime/metrics under a unique port:

Document layout:

{
        "_id" : ObjectId("5069d68700a2934015000000"),
        "port_name" : "CL1-A",
        "metric" : "340.0",
        "port_number" : "0",
        "datetime" : ISODate("2012-09-30T13:44:00Z"),
        "array_serial" : "12345"
}

and mapreduce functions:

var query = {
        'array_serial' : array,
        'port_name' : { $in : ports },
        'datetime' : { $gte : from, $lte : to}

    }

    var map = function() {
        emit( { portname : this.port_name } , { datetime : this.datetime,
                                metric : this.metric });
    }

    var reduce = function(key, values) {
        var res = { dates : [], metrics : [], count : 0}

        values.forEach(function(value){
            res.dates.push(value.datetime);
            res.metrics.push(value.metric);
            res.count++;
        })

        return res;
    }

    var command = {
        mapreduce : collection,
        map : map.toString(),
        reduce : reduce.toString(),
        query : query,
        out : { inline : 1 }

    }

    mongoose.connection.db.executeDbCommand(command, function(err, dbres){
        if(err) throw err;
        console.log(dbres.documents);
        res.json(dbres.documents[0].results);
    })

If a small number of records is requested, say 5 or 10, or even 60 I get all the data back I'm expecting. Larger queries return truncated values....


I just did some more testing and it seems like it's limiting the record output to 100? This is minutely data and when I run a query for a 24 hour period I would expect 1440 records back... I just ran it a received 80. :\

Is this expected? I'm not specifying a limit anywhere I can tell...


More data:

Query for records from 2012-10-01T23:00 - 2012-10-02T00:39 (100 minutes) returns correctly:

[
  {
    "_id": {
      "portname": "CL1-A"
    },
    "value": {
      "dates": [
        "2012-10-01T23:00:00.000Z",
        "2012-10-01T23:01:00.000Z",
        "2012-10-01T23:02:00.000Z",
         ...cut...
        "2012-10-02T00:37:00.000Z",
        "2012-10-02T00:38:00.000Z",
        "2012-10-02T00:39:00.000Z"
      ],
      "metrics": [
        "1596.0",
        "1562.0",
        "1445.0",
        ...cut...
        "774.0",
        "493.0",
        "342.0"
      ],
      "count": 100
    }
  }
]

...add one more minute to the query 2012-10-01T23:00 - 2012-10-02T00:39 (101 minutes) :

[
  {
    "_id": {
      "portname": "CL1-A"
    },
    "value": {
      "dates": [
        null,
        "2012-10-02T00:40:00.000Z"
      ],
      "metrics": [
        null,
        "487.0"
      ],
      "count": 2
    }
  }
]

the dbres.documents object shows the correct expected emitted records:

[ { results: [ [Object] ],
    timeMillis: 8,
    counts: { input: 101, emit: 101, reduce: 2, output: 1 },
    ok: 1 } ]

...so is the data getting lost somewhere?

like image 401
Chris Matta Avatar asked Oct 06 '12 03:10

Chris Matta


People also ask

How do I ignore NULL values in MongoDB?

Solution 1: In case preservation of all null values[] or null fields in the array itself is not necessary. Filter out the not null elements using $filter , the ( all null elements) array would be empty, filter that out from documents using $match then $sort on values .

Does MongoDB store null value?

Indeed, it's not possible to store null values in a MongoDB document using a DataFrame. The Python None values are considered as missing attributes accordingly to this NoSQL specific allowance. However, if your column is numerical, you can force writing a null value by setting it to NaN.

How check field is null in MongoDB?

MongoDB fetch documents containing 'null' If we want to fetch documents from the collection "testtable" which contains the value of "interest" is null, the following mongodb command can be used : >db. testtable. find( { "interest" : null } ).

Does MongoDB support mapReduce?

MongoDB supports map-reduce operations on sharded collections. However, starting in version 4.2, MongoDB deprecates the map-reduce option to create a new sharded collection and the use of the sharded option for map-reduce. To output to a sharded collection, create the sharded collection first.


2 Answers

Rule number one of MapReduce:

Thou shall return from Reduce the exact same format that you emit with your key in Map.

Rule number two of MapReduce:

Thou shall reduce the array of values passed to reduce as many times as necessary. Reduce function may be called many times.

You've broken both of those rules in your implementation of reduce.

Your Map function is emitting key, value pairs.

key: port name (you should simply emit the name as the key, not a document)
value: a document representing three things you need to accumulate (date, metric, count)

Try this instead:

map = function() {  // if you want to reduce to an array you have to emit arrays
    emit ( this.port_name, { dates : [this.datetime], metrics : [this.metric], count: 1 });
}

reduce = function(key, values) {        // for each key you get an array of values
   var res = { dates: [], metrics: [], count: 0 };  // you must reduce them to one

   values.forEach(function(value) {
            res.dates = value.dates.concat(res.dates);
            res.metrics = value.metrics.concat(res.metrics);
            res.count += value.count;   // VERY IMPORTANT reduce result may be re-reduced
        }) 

        return res;
    }
like image 184
Asya Kamsky Avatar answered Oct 05 '22 09:10

Asya Kamsky


Try to output the map reduce data in a temp collection instead of in memory. May that is the reason. From Mongo Docs:

{ inline : 1} - With this option, no collection will be created, and the whole map-reduce operation will happen in RAM. Also, the results of the map-reduce will be returned within the result object. Note that this option is possible only when the result set fits within the 16MB limit of a single document. In v2.0, this is your only available option on a replica set secondary.

Also, It may not be the reason but MongoDB has data size limitations (2GB) on a 32bit machine.

like image 26
vikas Avatar answered Oct 05 '22 10:10

vikas