Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Use mongodb aggregation framework to group by length of array

I have a collection that looks something like this:

{
    "_id": "id0",
    "name": "...",
    "saved_things": [
        { ... },
        { ... },
        { ... },
    ]
}
{
    "_id": "id1",
    "name": "...",
    "saved_things": [
        { ... },
    ]
}
{
    "_id": "id2",
    "name": "...",
    "saved_things": [
        { ... },
    ]
}

etc...

I want to use mongodb's aggregation framework in order to come up with a histogram result that tells how many users have a certain count of the saved_things. For example, for the dataset above it could return something like:

{ "_id": 1, "count": 2 },
{ "_id": 3, "count": 1 }

I've tried various combinations of aggregate functions like the one below, but none have worked out correctly. (I get the feeling that I'm going about this terribly wrong.)

collection.aggregate([
    { $unwind: "$saved_things" },
    { $group: "$_id", count: { $sum: 1 } } },
    { $group: "$count", number: { $sum: 1 } } },
    { $sort: { number: -1 } }
], function(err, result) {
    console.log(result);
});

Is this possible with Mongo's aggregate framework or would I be better off with a map reduce function?

like image 339
Steve Gattuso Avatar asked Jul 30 '13 18:07

Steve Gattuso


People also ask

How can you group by a particular value in MongoDB?

We can group by single as well as multiple field from the collection, we can use $group operator in MongoDB to group fields from the collection and returns the new document as result. We are using $avg, $sum, $max, $min, $push, $last, $first and $addToSet operator with group by in MongoDB.

Which aggregation method is preferred for use by MongoDB?

The pipeline provides efficient data aggregation using native operations within MongoDB, and is the preferred method for data aggregation in MongoDB. The aggregation pipeline can operate on a sharded collection.


2 Answers

Ok, got it! Here we go. The aggregation pipeline is basically that:

{
    $unwind: "$saved_things"
},
{
    $group: {
        _id: "$_id",
        size: {
            $sum: 1
        }
    }
},
{
    $group: {
        _id: "$size",
        frequency: {
            $sum: 1
        }
    }
},
{
    $project: {
        size: "$_id",
        frequency: 1,
        _id: 0
    }
}

Unwind saved_things array, then group by document _id and count it, thus we can achieve the array size. Now is easy, group by size and count the frequency. Use project to rename _id field to size.

like image 182
Miguel Cartagena Avatar answered Jan 01 '23 09:01

Miguel Cartagena


you can use $size key Example

query :

[{ 
   $group: {
     _id:{$size:'$saved_things'},
     total: { $sum: 1 },
   }
}]

output:
[{ _id: 4, total: 2 }]

like image 37
Ritesh Vishwakarma Avatar answered Jan 01 '23 10:01

Ritesh Vishwakarma