Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MongoDB MapReduce update in place how to

*Basically I'm trying to order objects by their score over the last hour.

I'm trying to generate an hourly votes sum for objects in my database. Votes are embedded into each object. The object schema looks like this:

{
    _id: ObjectId
    score: int
    hourly-score: int <- need to update this value so I can order by it
    recently-voted: boolean
    votes: {
        "4e4634821dff6f103c040000": { <- Key is __toString of voter ObjectId
            "_id": ObjectId("4e4634821dff6f103c040000"), <- Voter ObjectId
            "a": 1, <- Vote amount
            "ca": ISODate("2011-08-16T00:01:34.975Z"), <- Created at MongoDate
            "ts": 1313452894 <- Created at timestamp
        },
        ... repeat ...
    }
}

This question is actually related to a question I asked a couple of days ago Best way to model a voting system in MongoDB

How would I (can I?) run a MapReduce command to do the following:

  1. Only run on objects with recently-voted = true OR hourly-score > 0.
  2. Calculate the sum of the votes created in the last hour.
  3. Update hourly-score = the sum calculated above, and recently-voted = false.

I also read here that I can perform a MapReduce on the slave DB by running db.getMongo().setSlaveOk() before the M/R command. Could I run the reduce on a slave and update the master DB?

Are in-place updates even possible with Mongo MapReduce?

like image 878
Marc Avatar asked Aug 16 '11 00:08

Marc


1 Answers

You can definitely do this. I'll address your questions one at a time:

1. You can specify a query along with your map-reduce, which filters the set of objects which will be passed into the map phase. In the mongo shell, this would look like (assuming m and r are the names of your mapper and reducer functions, respectively):

> db.coll.mapReduce(m, r, {query: {$or: [{"recently-voted": true}, {"hourly-score": {$gt: 0}}]}})

2. Step #1 will let you use your mapper on all documents with at least one vote in the last hour (or with recently-voted set to true), but not all the votes will have been in the last hour. So you'll need to filter the list in your mapper, and only emit those votes you wish to count:

function m() {
  var hour_ago = new Date() - 3600000;
  this.votes.forEach(function (vote) {
    if (vote.ts > hour_ago) {
      emit(/* your key */, this.vote.a);
    }
  });
}

And to reduce:

function r(key, values) {
  var sum = 0;
  values.forEach(function(value) { sum += value; });
  return sum;
}

3. To update the hourly scores table, you can use the reduceOutput option to map-reduce, which will call your reducer with both the emitted values, and the previously saved value in the output collection, (if any). The result of that pass will be saved into the output collection. This looks like:

> db.coll.mapReduce(m, r, {query: ..., out: {reduce: "output_coll"}})

In addition to re-reducing output, you can use merge which will overwrite documents in the output collection with newly created ones (but leaving behind any documents with an _id different than the _ids created by your m-r job), replace, which is effectively a drop-and-create operation and is the default, or use {inline: 1}, which will return the results directly to the shell or to your driver. Note that when using {inline: 1}, your results must fit in the size allowed for a single document (16MB in recent MongoDB releases).

(4.) You can run map-reduce jobs on secondaries ("slaves"), but since secondaries cannot accept writes (that's what makes them secondary), you can only do this when using inline output.

like image 186
dcrosta Avatar answered Nov 15 '22 06:11

dcrosta