Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MongoDB aggregation on multiple collections

I need to create aggregation that runs on multiple collections with similar structure. I know about the $lookup aggregation, but i actually do not want to join between the documents, but to make a list of all the documents from all the collections. To better clarify my intention, I'll use an example.

Students collection:

{
     "_id" : ObjectId("57278a449fb5ba91248b3bc0"),
     "age": 22
}

Teachers collection:

{
     "_id" : ObjectId("57278a449fb5ba91248b3bc0"),
     "age": 28
}

I want to create an aggregation that will give me the average age of both of the collections together. How can i do it without using two aggregations and combine the results with my code?

like image 436
Shelef Avatar asked Mar 29 '17 11:03

Shelef


People also ask

Can we join 2 collections in MongoDB?

For performing MongoDB Join two collections, you must use the $lookup operator. It is defined as a stage that executes a left outer join with another collection and aids in filtering data from joined documents.

Is MongoDB aggregate fast?

On large collections of millions of documents, MongoDB's aggregation was shown to be much worse than Elasticsearch. Performance worsens with collection size when MongoDB starts using the disk due to limited system RAM. The $lookup stage used without indexes can be very slow.


1 Answers

You can get away with using new pipeline style lookups like this:

db.getCollection('students').aggregate(
    [
        {
            $group: {
                '_id': 0
            }
        },
        {
            $lookup: {
                from: 'students',
                let: {},
                pipeline: [
                    { $group: {
                        '_id': 0,
                        'avg': { $avg: '$age' },
                        'count': { $sum: 1 }
                    } }
                ],
                as: 'students'
            }
        },
        {
            $lookup: {
                from: 'teachers',
                let: {},
                pipeline: [
                    { $group: {
                        '_id': 0,
                        'avg': { $avg: '$age' },
                        'count': { $sum: 1 }
                    } }
                ],
                as: 'teachers'
            }
        },
        {
            $unwind: {
                path : '$students',
            }
        },
        {
            $unwind: {
                path : '$teachers',
            }
        },
        {
            $project: {
                'avg_age': { $divide: [
                    { $sum: [
                        { $multiply: [ '$students.avg', '$students.count' ] },
                        { $multiply: [ '$teachers.avg', '$teachers.count' ] }
                    ] },
                    { $sum: [ '$students.count', '$teachers.count' ] },
                ] }
            }
        },
    ]
);

The first $group stage gives you single empty document to start from, so each $lookup is only executed once. You can combine the averages from each collection by weighting by the count, this gives the same result as if taking the average over all the documents.

like image 157
Sam Tolmay Avatar answered Oct 29 '22 09:10

Sam Tolmay