Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Mongoose / MongoDB: count elements in array

I'm trying to count the number of occurrences of a string in an array in my collection using Mongoose. My "schema" looks like this:

var ThingSchema = new Schema({
  tokens: [ String ]
});

My objective is to get the top 10 "tokens" in the "Thing" collection, which can contain multiple values per document. For example:

var documentOne = {
    _id: ObjectId('50ff1299a6177ef9160007fa')
  , tokens: [ 'foo' ]
}

var documentTwo = {
    _id: ObjectId('50ff1299a6177ef9160007fb')
  , tokens: [ 'foo', 'bar' ]
}

var documentThree = {
    _id: ObjectId('50ff1299a6177ef9160007fc')
  , tokens: [ 'foo', 'bar', 'baz' ]
}

var documentFour = {
    _id: ObjectId('50ff1299a6177ef9160007fd')
  , tokens: [ 'foo', 'baz' ]
}

...would give me data result:

[ foo: 4, bar: 2 baz: 2 ]

I'm considering using MapReduce and Aggregate for this tool, but I'm not certain what is the best option.

like image 701
Eric Martindale Avatar asked Jan 31 '13 02:01

Eric Martindale


1 Answers

Aha, I've found the solution. MongoDB's aggregate framework allows us to execute a series of tasks on a collection. Of particular note is $unwind, which breaks an array in a document into unique documents, so they can be groups / counted en masse.

MongooseJS exposes this very accessibly on a model. Using the example above, this looks as follows:

Thing.aggregate([
    { $match: { /* Query can go here, if you want to filter results. */ } } 
  , { $project: { tokens: 1 } } /* select the tokens field as something we want to "send" to the next command in the chain */
  , { $unwind: '$tokens' } /* this converts arrays into unique documents for counting */
  , { $group: { /* execute 'grouping' */
          _id: { token: '$tokens' } /* using the 'token' value as the _id */
        , count: { $sum: 1 } /* create a sum value */
      }
    }
], function(err, topTopics) {
  console.log(topTopics);
  // [ foo: 4, bar: 2 baz: 2 ]
});

It is noticeably faster than MapReduce in preliminary tests across ~200,000 records, and thus likely scales better, but this is only after a cursory glance. YMMV.

like image 123
Eric Martindale Avatar answered Sep 28 '22 09:09

Eric Martindale