I'm writing a query that requires a $lookup between two tables and as I understand it, it's essential that the foreignField have an index in order to perform this join in a timely fashion. However, even after adding an index on the field, the query is still falling back to COLLSCAN.
db.users.aggregate([
{$lookup:{ from: "transactions", localField: '_id', foreignField: 'uid', as: 'transaction' }},
{ $match: { transaction: { "$size" : 0} } },
{ $count: "total"},
], { explain: true })
This returns:
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "test.users",
"indexFilterSet" : false,
"parsedQuery" : {
},
"winningPlan" : {
"stage" : "COLLSCAN",
"direction" : "forward"
},
"rejectedPlans" : [ ]
}
As I mentioned, I do have the uid field indexed in the transactions collection:
> db.transactions.getIndexes()
[
{
"v" : 1,
"key" : {
"_id" : 1
},
"name" : "_id_",
"ns" : "test.transactions"
},
{
"v" : 1,
"key" : {
"uid" : 1
},
"name" : "uid_1",
"ns" : "test.transactions"
}
]
The query takes a few minutes to run in a DB of approximately 7M documents. I'm using MongoDB v3.4.7. Any idea as to what I could be doing wrong? Thanks in advance!
Indexes support the efficient execution of queries in MongoDB. Without indexes, MongoDB must perform a collection scan, i.e. scan every document in a collection, to select those documents that match the query statement.
Generally, MongoDB only uses one index to fulfill most queries. However, each clause of an $or query may use a different index, and in addition, MongoDB can use an intersection of multiple indexes.
$group does not use index data. From the mongoDB docs: The $match and $sort pipeline operators can take advantage of an index when they occur at the beginning of the pipeline.
$lookup performs an equality match on the localField to the foreignField from the documents of the from collection. If an input document does not contain the localField , the $lookup treats the field as having a value of null for matching purposes.
The "stage" : "COLLSCAN",
isn't referring to the $lookup
at all.
The first step in that aggregation pipeline is to fetch all of the documents from the 'users' collection. Since there are no filters at all provided for that, collection scan is the most efficient method.
The $lookup stage should be planned like any other query, and would likely use the index.
Because your aggregation pipeline first stage has no $match
or $sort
or $geoNear
query on indexed key and In $match stage you didn't query on any index key.
Case 1: If you do $match on indexed key in first stage, WinningPlan
stage will be "FETCH"
and stage of inputStage
will be "IXSCAN"
"winningPlan" : {
"stage" : "FETCH",
"inputStage" : {
"stage" : "IXSCAN",
...
}
}
Case 2: If you do $match on non-indexed key in first stage, WinningPlan
stage will be "COLLSCAN"
"winningPlan" : {
"stage" : "COLLSCAN"
}
Case 3: If You do $match on index key after lookup (according to your query), WinningPlan
stage will be "FETCH"
and inputStage
will be "IXSCAN"
.
Case 4: If You do $match on non-index key after lookup (just you did), WinningPlan
stage will be "COLLSCAN"
.
For 7M record, you must use index in your queries. Don't do too much indexing, because they will stored in the RAM and you can't use $ne
or $nin
on indexed key properly.
Mongodb Docs: Optimizing Aggregation Pipeline
Mongodb Docs: Indexing Strategies
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With