Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MongoDB group by array inner-elements

I've got a list of articles, and each of them has an array property which lists various individuals mentioned in them:

_id: {     $oid: "52b632a9e4f2ba13c82ccd23" }, providerName: "The Guardian", url: "http://feeds.theguardian.com/c/34708/f/663860/s/3516cebc/sc/38/l/0L0Stheguardian0N0Cmusic0C20A130Cdec0C220Cwaterboys0Efishermans0Eblues0Etour0Ehammersmith/story01.htm", subject: "The Waterboys – review", class_artist: [     "paul mccartney" ] 

I've been trying (unsuccessfully) to get a list of all the individual artists (class_artist), based on the number of articles they've been tagged in within the past 7 days.

I've gotten as far as:

var date = new Date(); date.setDate(date.getDate() - 7);  db.articles.group({     key: { class_artist: 1 },     cond: { class_date: { $gt: date } },     reduce: function ( curr, result ) { result.cnt++; },     initial: { cnt : 0 } }).sort({cnt: -1}); 

But unfortunately, it doesn't count them based on the individual array values, but by array compositions (that is, lists of artists).

I tried using the $unwind function, but have not been able to make it work.

like image 316
Gil Adirim Avatar asked Feb 02 '14 09:02

Gil Adirim


1 Answers

What framework are you using? This is not MongoDB shell and looks like some weird wrapper around MapReduce. In that case $unwind would not be available, and you need it for user in the aggregation framework. Here's what you want in the mongo shell:

db.articles.aggregate([   {$match: { class_date: { $gte: date } } },   {$project: { _id: 0, class_artist: 1 } },   {$unwind: "$class_artist" },   {$group: { _id: "$class_artist", tags: { $sum: 1 } }},   {$project: { _id: 0,class_artist: "$_id", tags: 1 } },   {$sort: { tags: -1 } } ]) 

So efficiently:

  1. Filter by date because you already set a var for the last 7 days
  2. Project only the field(s) we need { We need only one! }
  3. Unwind the array so we now have a record for every array element in every document
  4. Group on the Artist from the expanded documents
  5. Project into a document format you can use as group messed around with _id
  6. Sort the results in reverse order to see the top tagged first

And the great thing about aggregation is you can gradually build up those stages to see what is going on.

Shake and bake into your own driver implmentation or ODM framework as required.

like image 99
Neil Lunn Avatar answered Sep 20 '22 01:09

Neil Lunn