Mongodb aggregation - sort makes the query very slow

Tags:

mongodb

aggregation-framework

Mongodb 3.2, installed on centos 6, with plenty of RAM and disk. I've a collection with 10K documents of the following structure:

Click to copy

{
  "id":5752034,
  "score":7.6,
  "name":"ASUS X551 15.6-inch Laptop", 
  "categoryId":"803",
  "positiveAspects":[{
                       "id":30030525,
                       "name":"price",
                       "score":9.8,
                       "frequency":139,
                       "rank":100098
                     },
                     {
                       "id":30028399,
                       "name":"use",
                       "score":9.9,
                       "frequency":99,
                       "rank":100099
                     }
                     .
                     .
                ]
}

For each document, the nested array positiveAspects has few hundreds of elements.

The collectoin has the follwing indexes:

Click to copy

{ "v" : 1, "key" : { "_id" : 1 }, "name" : "_id_", "ns" : "proddb.product_trees" }
{ "v" : 1, "key" : { "positiveAspects.id" : 1.0, "positiveAspects.score" : 1.0 }, "name" : "positiveAspects.id_1_positiveAspects.score_1", "ns" : "proddb.product_trees" }
{ "v" : 1, "key" : { "categoryId" : 1.0, "score" : 1.0 }, "name" : "categoryId_1_score_1", "ns" : "proddb.product_trees" }
{ "v" : 1, "key" : { "rank" : -1.0 }, "name" : "rank_-1", "ns" : "proddb.product_trees" }
{ "v" : 1, "key" : { "positiveAspects.rank" : -1.0 }, "name" : "positiveAspects.rank_-1", "ns" : "proddb.product_trees" }

I would like to run the following aggregation, it takes about 40 seconds:

Click to copy

{  
  aggregate:"product_trees",
  pipeline:[  
  {  
     $match:{  
        categoryId:"803",
        score:{  
           $gte:8.0
        }
     }
  },
  {  
     $unwind:"$positiveAspects"
  },
  {  
     $match:{  
        positiveAspects.id:30030525,
        positiveAspects.score:{  
           $gte:9.0
        }
     }
  },
  {  
     $sort:{  
        positiveAspects.rank:-1
     }
  },
  {  
     $project:{  
        _id:0,
        score:1,
        id:1,
        name:1,
        positiveAspects:1
     }
  },
  {  
     $limit:10
  }
 ]
}

With the following explain:

Click to copy

2016-06-01T16:10:49.140-0500 D QUERY    [conn47] Beginning planning...
=============================
Options = NO_BLOCKING_SORT INDEX_INTERSECTION
Canonical query:
ns=proddb.product_treesTree: $and
    categoryId == "803"
    score $gte 8.0
Sort: {}
Proj: {}
=============================
2016-06-01T16:10:49.140-0500 D QUERY    [conn47] Index 0 is kp: { _id: 1 } unique name: '_id_' io: { v: 1, key: { _id: 1 }, name: "_id_", ns: "proddb.product_trees" }
2016-06-01T16:10:49.140-0500 D QUERY    [conn47] Index 1 is kp: { positiveAspects.id: 1.0, positiveAspects.score: 1.0 } multikey name: 'positiveAspects.id_1_positiveAspects.score_1' io: { v: 1, key: { positiveAspects.id: 1.0, positiveAspects.score: 1.0 }, name: "positiveAspects.id_1_positiveAspects.score_1", ns: "proddb.product_trees" }
2016-06-01T16:10:49.140-0500 D QUERY    [conn47] Index 2 is kp: { categoryId: 1.0, score: 1.0 } name: 'categoryId_1_score_1' io: { v: 1, key: { categoryId: 1.0, score: 1.0 }, name: "categoryId_1_score_1", ns: "proddb.product_trees" }
2016-06-01T16:10:49.140-0500 D QUERY    [conn47] Index 3 is kp: { rank: -1.0 } name: 'rank_-1' io: { v: 1, key: { rank: -1.0 }, name: "rank_-1", ns: "proddb.product_trees" }
2016-06-01T16:10:49.140-0500 D QUERY    [conn47] Index 4 is kp: { positiveAspects.rank: -1.0 } multikey name: 'positiveAspects.rank_-1' io: { v: 1, key: { positiveAspects.rank: -1.0 }, name: "positiveAspects.rank_-1", ns: "proddb.product_trees" }
2016-06-01T16:10:49.140-0500 D QUERY    [conn47] Predicate over field 'score'
2016-06-01T16:10:49.140-0500 D QUERY    [conn47] Predicate over field 'categoryId'
2016-06-01T16:10:49.140-0500 D QUERY    [conn47] Relevant index 0 is kp: { categoryId: 1.0, score: 1.0 } name: 'categoryId_1_score_1' io: { v: 1, key: { categoryId: 1.0, score: 1.0 }, name: "categoryId_1_score_1", ns: "proddb.product_trees" }
2016-06-01T16:10:49.140-0500 D QUERY    [conn47] Rated tree:
$and
    categoryId == "803"  || First: 0 notFirst: full path: categoryId
    score $gte 8.0  || First: notFirst: 0 full path: score
2016-06-01T16:10:49.140-0500 D QUERY    [conn47] Tagging memoID 1
2016-06-01T16:10:49.140-0500 D QUERY    [conn47] Enumerator: memo just before moving:
2016-06-01T16:10:49.140-0500 D QUERY    [conn47] About to build solntree from tagged tree:
$and
    categoryId == "803"  || Selected Index #0 pos 0
    score $gte 8.0  || Selected Index #0 pos 1
2016-06-01T16:10:49.140-0500 D QUERY    [conn47] Planner: adding solution:
FETCH
---fetched = 1
---sortedByDiskLoc = 0
---getSort = [{ categoryId: 1 }, { categoryId: 1, score: 1 }, { score: 1 }, ]
---Child:
------IXSCAN
---------keyPattern = { categoryId: 1.0, score: 1.0 }
---------direction = 1
---------bounds = field #0['categoryId']: ["803", "803"], field #1['score']: [8.0, inf.0]
---------fetched = 0
---------sortedByDiskLoc = 0
---------getSort = [{ categoryId: 1 }, { categoryId: 1, score: 1 }, { score: 1 }, ]
2016-06-01T16:10:49.140-0500 D QUERY    [conn47] Planner: outputted 1 indexed solutions.
2016-06-01T16:10:49.140-0500 D QUERY    [conn47] Only one plan is available; it will be run but will not be cached. query: { categoryId: "803", score: { $gte: 8.0 } } sort: {} projection: {}, planSummary: IXSCAN { categoryId: 1.0, score: 1.0 }
2016-06-01T16:11:27.170-0500 I COMMAND  [conn47] command proddb.product_trees command: aggregate { aggregate: "product_trees", pipeline: [ { $match: { categoryId: "803", score: { $gte: 8.0 } } }, { $unwind: "$positiveAspects" }, { $match: { positiveAspects.id: 30030525, positiveAspects.score: { $gte: 9.0 } } }, { $sort: { positiveAspects.rank: -1 } }, { $project: { _id: 0, score: 1, id: 1, name: 1, positiveAspects: 1 } }, { $limit: 10 } ], cursor: {} } keyUpdates:0 writeConflicts:0 numYields:226 reslen:7459 locks:{ Global: { acquireCount: { r: 906 } }, Database: { acquireCount: { r: 453 } }, Collection: { acquireCount: { r: 453 } } } protocol:op_query 38030ms

Taking out the $sort, the query runs in 2 seconds.

Can you explain why the $sort cause such performance hit, considerig there is index it can use? Is there an index I missed What can be done in order to fix?

Thanks!

Mongodb aggregarion - sort makes the query very slow

935

asked Jun 01 '16 21:06

Seffy

1 Answers

It's because $sort is not using index when not used in early stage of aggregation framework. To take advantage of indexing, $sort or $match must be used as first stage.

Please see Pipeline Operators and Indexes

167

answered Nov 15 '22 06:11

Saleem

Related questions
                            
                                MongoDB: insert documents with specific id instead of auto generated ObjectID
                            
                                Encrypt Mongo data in Meteorjs
                            
                                What are best practices for partitioning data in MongoDB?
                            
                                MongoDB Find and Remove Algorithmic Complexity
                            
                                MongoDB: No server chosen by ReadPreferenceServerSelector
                            
                                Mongo Collection Find By Id with Filter
                            
                                Spring data mongodb not closing mongodb connections
                            
                                Query in Array with dates
                            
                                How to $and two documents in mongodb and Java?
                            
                                how to write cron job in play framework 2.3
                            
                                Using native ES6 promises with MongoDB
                            
                                Mongoose saving with objectId
                            
                                MongoDB installation error: "mongod: error while loading shared libraries: libc++.so.1..."
                            
                                Nested array $pull query using C# MongoDB driver
                            
                                Aggregation Conditional Count on Present Fields
                            
                                connection to remote mongodb server failed in golang, giving authentication error
                            
                                Spring Data Mongo Custom Repository Query with ObjectID
                            
                                Meteor.js connection to Mongo using X509 certificate auth
                            
                                mongo query for multiple condition
                            
                                MongoDB Aggregate function in C#

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Mongodb aggregation - sort makes the query very slow

Tags:

mongodb

aggregation-framework

Seffy

People also ask

1 Answers

Saleem

Recent Activity

Donate For Us