Mongodb 3.2, installed on centos 6, with plenty of RAM and disk. I've a collection with 10K documents of the following structure:
{
"id":5752034,
"score":7.6,
"name":"ASUS X551 15.6-inch Laptop",
"categoryId":"803",
"positiveAspects":[{
"id":30030525,
"name":"price",
"score":9.8,
"frequency":139,
"rank":100098
},
{
"id":30028399,
"name":"use",
"score":9.9,
"frequency":99,
"rank":100099
}
.
.
]
}
For each document, the nested array positiveAspects has few hundreds of elements.
The collectoin has the follwing indexes:
{ "v" : 1, "key" : { "_id" : 1 }, "name" : "_id_", "ns" : "proddb.product_trees" }
{ "v" : 1, "key" : { "positiveAspects.id" : 1.0, "positiveAspects.score" : 1.0 }, "name" : "positiveAspects.id_1_positiveAspects.score_1", "ns" : "proddb.product_trees" }
{ "v" : 1, "key" : { "categoryId" : 1.0, "score" : 1.0 }, "name" : "categoryId_1_score_1", "ns" : "proddb.product_trees" }
{ "v" : 1, "key" : { "rank" : -1.0 }, "name" : "rank_-1", "ns" : "proddb.product_trees" }
{ "v" : 1, "key" : { "positiveAspects.rank" : -1.0 }, "name" : "positiveAspects.rank_-1", "ns" : "proddb.product_trees" }
I would like to run the following aggregation, it takes about 40 seconds:
{
aggregate:"product_trees",
pipeline:[
{
$match:{
categoryId:"803",
score:{
$gte:8.0
}
}
},
{
$unwind:"$positiveAspects"
},
{
$match:{
positiveAspects.id:30030525,
positiveAspects.score:{
$gte:9.0
}
}
},
{
$sort:{
positiveAspects.rank:-1
}
},
{
$project:{
_id:0,
score:1,
id:1,
name:1,
positiveAspects:1
}
},
{
$limit:10
}
]
}
With the following explain:
2016-06-01T16:10:49.140-0500 D QUERY [conn47] Beginning planning...
=============================
Options = NO_BLOCKING_SORT INDEX_INTERSECTION
Canonical query:
ns=proddb.product_treesTree: $and
categoryId == "803"
score $gte 8.0
Sort: {}
Proj: {}
=============================
2016-06-01T16:10:49.140-0500 D QUERY [conn47] Index 0 is kp: { _id: 1 } unique name: '_id_' io: { v: 1, key: { _id: 1 }, name: "_id_", ns: "proddb.product_trees" }
2016-06-01T16:10:49.140-0500 D QUERY [conn47] Index 1 is kp: { positiveAspects.id: 1.0, positiveAspects.score: 1.0 } multikey name: 'positiveAspects.id_1_positiveAspects.score_1' io: { v: 1, key: { positiveAspects.id: 1.0, positiveAspects.score: 1.0 }, name: "positiveAspects.id_1_positiveAspects.score_1", ns: "proddb.product_trees" }
2016-06-01T16:10:49.140-0500 D QUERY [conn47] Index 2 is kp: { categoryId: 1.0, score: 1.0 } name: 'categoryId_1_score_1' io: { v: 1, key: { categoryId: 1.0, score: 1.0 }, name: "categoryId_1_score_1", ns: "proddb.product_trees" }
2016-06-01T16:10:49.140-0500 D QUERY [conn47] Index 3 is kp: { rank: -1.0 } name: 'rank_-1' io: { v: 1, key: { rank: -1.0 }, name: "rank_-1", ns: "proddb.product_trees" }
2016-06-01T16:10:49.140-0500 D QUERY [conn47] Index 4 is kp: { positiveAspects.rank: -1.0 } multikey name: 'positiveAspects.rank_-1' io: { v: 1, key: { positiveAspects.rank: -1.0 }, name: "positiveAspects.rank_-1", ns: "proddb.product_trees" }
2016-06-01T16:10:49.140-0500 D QUERY [conn47] Predicate over field 'score'
2016-06-01T16:10:49.140-0500 D QUERY [conn47] Predicate over field 'categoryId'
2016-06-01T16:10:49.140-0500 D QUERY [conn47] Relevant index 0 is kp: { categoryId: 1.0, score: 1.0 } name: 'categoryId_1_score_1' io: { v: 1, key: { categoryId: 1.0, score: 1.0 }, name: "categoryId_1_score_1", ns: "proddb.product_trees" }
2016-06-01T16:10:49.140-0500 D QUERY [conn47] Rated tree:
$and
categoryId == "803" || First: 0 notFirst: full path: categoryId
score $gte 8.0 || First: notFirst: 0 full path: score
2016-06-01T16:10:49.140-0500 D QUERY [conn47] Tagging memoID 1
2016-06-01T16:10:49.140-0500 D QUERY [conn47] Enumerator: memo just before moving:
2016-06-01T16:10:49.140-0500 D QUERY [conn47] About to build solntree from tagged tree:
$and
categoryId == "803" || Selected Index #0 pos 0
score $gte 8.0 || Selected Index #0 pos 1
2016-06-01T16:10:49.140-0500 D QUERY [conn47] Planner: adding solution:
FETCH
---fetched = 1
---sortedByDiskLoc = 0
---getSort = [{ categoryId: 1 }, { categoryId: 1, score: 1 }, { score: 1 }, ]
---Child:
------IXSCAN
---------keyPattern = { categoryId: 1.0, score: 1.0 }
---------direction = 1
---------bounds = field #0['categoryId']: ["803", "803"], field #1['score']: [8.0, inf.0]
---------fetched = 0
---------sortedByDiskLoc = 0
---------getSort = [{ categoryId: 1 }, { categoryId: 1, score: 1 }, { score: 1 }, ]
2016-06-01T16:10:49.140-0500 D QUERY [conn47] Planner: outputted 1 indexed solutions.
2016-06-01T16:10:49.140-0500 D QUERY [conn47] Only one plan is available; it will be run but will not be cached. query: { categoryId: "803", score: { $gte: 8.0 } } sort: {} projection: {}, planSummary: IXSCAN { categoryId: 1.0, score: 1.0 }
2016-06-01T16:11:27.170-0500 I COMMAND [conn47] command proddb.product_trees command: aggregate { aggregate: "product_trees", pipeline: [ { $match: { categoryId: "803", score: { $gte: 8.0 } } }, { $unwind: "$positiveAspects" }, { $match: { positiveAspects.id: 30030525, positiveAspects.score: { $gte: 9.0 } } }, { $sort: { positiveAspects.rank: -1 } }, { $project: { _id: 0, score: 1, id: 1, name: 1, positiveAspects: 1 } }, { $limit: 10 } ], cursor: {} } keyUpdates:0 writeConflicts:0 numYields:226 reslen:7459 locks:{ Global: { acquireCount: { r: 906 } }, Database: { acquireCount: { r: 453 } }, Collection: { acquireCount: { r: 453 } } } protocol:op_query 38030ms
Taking out the $sort
, the query runs in 2 seconds.
Can you explain why the $sort
cause such performance hit, considerig there is index it can use? Is there an index I missed What can be done in order to fix?
Thanks!
Mongodb aggregarion - sort makes the query very slow
Aggregation is slow - Working with Data - MongoDB Developer Community Forums.
On large collections of millions of documents, MongoDB's aggregation was shown to be much worse than Elasticsearch. Performance worsens with collection size when MongoDB starts using the disk due to limited system RAM. The $lookup stage used without indexes can be very slow.
MongoDB generally performs a stable sort unless sorting on a field that holds duplicate values.
It's because $sort
is not using index when not used in early stage of aggregation framework. To take advantage of indexing, $sort or $match must be used as first stage.
Please see Pipeline Operators and Indexes
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With