Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Mongodb aggregation $group followed by $limit for pagination

In MongoDB aggregation pipeline, record flow from stage to stage happens one/batch at a time (or) will wait for the current stage to complete for whole collection before passing it to next stage?

For e.g., I have a collection classtest with following sample records

{name: "Person1", marks: 20}
{name: "Person2", marks: 20}
{name: "Person1", marks: 20}

I have total 1000 records for about 100 students and I have following aggregate query

    db.classtest.aggregate(
[
    {$sort: {name: 1}},
    {$group: {_id: '$name',
            total: {$sum: '$marks'}}},
    {$limit: 5}
])

I have following questions.

  1. The sort order is lost in final results. If I place another sort after $group, then results are sorted properly. Does that mean $group not maintains the previous sort order?
  2. I would like to limit the results to 5. Does group operation has to be completely done (for all 1000 records) before passing to the limit. (or) The group operation passes the records to limit stage as and when it has record and stops processing when the requirement for limit stage is met?

My actual idea is to do pagination on results of aggregate. In above scenario, if $group maintains sort order and processes only required number of records, I want to apply $match condition {$ge: 'lastPersonName'} in subsequent page queries.

  1. I do not want to apply $limit before $group as I want results for 5 students not first 5 records.
  2. I may not want to use $skip as that means effectively traversing those many records.
like image 491
Poorna Subhash Avatar asked Aug 18 '15 06:08

Poorna Subhash


2 Answers

I have solved the problem without need of maintaining another collection or even without $group traversing whole collection, hence posting my own answer.

As others have pointed:

  1. $group doesn't retain order, hence early sorting is not of much help.
  2. $group doesn't do any optimization, even if there is a following $limit, i.e., runs $group on entire collection.

My usecase has following unique features, which helped me to solve it:

  1. There will be maximum of 10 records per each student (minimum of 1).
  2. I am not very particular on page size. The front-end capable of handling varying page sizes. The following is the aggregation command I have used.

    db.classtest.aggregate(
    [
        {$sort: {name: 1}},
        {$limit: 5 * 10},
        {$group: {_id: '$name',
            total: {$sum: '$marks'}}},
        {$sort: {_id: 1}}
    ])
    

Explaining the above.

  1. if $sort immediately precedes $limit, the framework optimizes the amount of data to be sent to next stage. Refer here
  2. To get a minimum of 5 records (page size), I need to pass at least 5 (page size) * 10 (max records per student) = 50 records to the $group stage. With this, the size of final result may be anywhere between 0 and 50.
  3. If the result is less than 5, then there is no further pagination required.
  4. If the result size is greater than 5, there may be chance that last student record is not completely processed (i.e., not grouped all the records of student), hence I discard the last record from the result.
  5. Then name in last record (among retained results) is used as $match criteria in subsequent page request as shown below.

    db.classtest.aggregate(
    [
        {$match: {name: {$gt: lastRecordName}}}
        {$sort: {name: 1}},
        {$limit: 5 * 10},
        {$group: {_id: '$name',
            total: {$sum: '$marks'}}},
        {$sort: {_id: 1}}
    ])
    

In above, the framework will still optimize $match, $sort and $limit together as single operation, which I have confirmed through explain plan.

like image 192
Poorna Subhash Avatar answered Oct 01 '22 14:10

Poorna Subhash


pagination on group data mongodb -

in $group items you can't directly apply pagination, but below trick will be used ,

if you want pagination on group data -

for example- i want group products categoryWise and then i want only 5 product per category then

step 1 - write aggregation on product table, and write groupBY

        { $group: { _id: '$prdCategoryId', products: { $push: '$$ROOT' } } },

step 2 - prdSkip for skipping , and limit for limiting data , pass it dynamically

        {
            $project: {
                // pagination for products
                products: {
                    $slice: ['$products', prdSkip, prdLimit],
                }
            }
        },

finally query looks like - params - limit , skip - for category pagination and prdSkip and PrdLimit for products pagination

    db.products.aggregate([

        { $group: { _id: '$prdCategoryId', products: { $push: '$$ROOT' } } },
        {
            $lookup: {
                from: 'categories',
                localField: '_id',
                foreignField: '_id',
                as: 'categoryProducts',
            },
        },
        {
            $replaceRoot: {
                newRoot: {
                    $mergeObjects: [{ $arrayElemAt: ['$categoryProducts', 0] }, '$$ROOT'],
                },
            },
        },
        {
            $project: {
                // pagination for products
                products: {
                    $slice: ['$products', prdSkip, prdLimit],
                },
                _id: 1,
                catName: 1,
                catDescription: 1,
            },
        },
    ])
    .limit(limit) // pagination for category
    .skip(skip);

I used replaceRoot here to pullOut category.

like image 26
Akshay Dhawle Avatar answered Oct 01 '22 14:10

Akshay Dhawle