I have a collection of documents with a multikey index defined. However, the performance of the query is pretty poor for just 43K documents. Is ~215ms for this query considered poor? Did I define the index correctly if nscanned is 43902 (which equals the total documents in the collection)?
Document:
{
"_id": {
"$oid": "50f7c95b31e4920008dc75dc"
},
"bank_accounts": [
{
"bank_id": {
"$oid": "50f7c95a31e4920009b5fc5d"
},
"account_id": [
"ff39089358c1e7bcb880d093e70eafdd",
"adaec507c755d6e6cf2984a5a897f1e2"
]
}
],
"created_date": "2013,01,17,09,50,19,274089",
}
Index:
{ "bank_accounts.bank_id" : 1 , "bank_accounts.account_id" : 1}
Query:
db.visitor.find({ "bank_accounts.account_id" : "ff39089358c1e7bcb880d093e70eafdd" , "bank_accounts.bank_id" : ObjectId("50f7c95a31e4920009b5fc5d")}).explain()
Explain:
{
"cursor" : "BtreeCursor bank_accounts.bank_id_1_bank_accounts.account_id_1",
"isMultiKey" : true,
"n" : 1,
"nscannedObjects" : 43902,
"nscanned" : 43902,
"nscannedObjectsAllPlans" : 43902,
"nscannedAllPlans" : 43902,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 213,
"indexBounds" : {
"bank_accounts.bank_id" : [
[
ObjectId("50f7c95a31e4920009b5fc5d"),
ObjectId("50f7c95a31e4920009b5fc5d")
]
],
"bank_accounts.account_id" : [
[
{
"$minElement" : 1
},
{
"$maxElement" : 1
}
]
]
},
"server" : "Not_Important"
}
I see three factors in play.
First, for application purposes, make sure that $elemMatch isn't a more appropriate query for this use-case. http://docs.mongodb.org/manual/reference/operator/elemMatch/. It seems like it would be bad if the wrong results came back due to multiple subdocuments satisfying the query.
Second, I imagine the high nscanned value can be accounted for by querying on each of the field values independently. .find({ bank_accounts.bank_id: X }) vs. .find({"bank_accounts.account_id": Y}). You may see that nscanned for the full query is about equal to nscanned of the largest subquery. If the index key were being evaluated fully as a range, this would not be expected, but...
Third, the { "bank_accounts.account_id" : [[{"$minElement" : 1},{"$maxElement" : 1}]] } clause of the explain plan shows that no range is being applied to this portion of the key.
Not really sure why, but I suspect it has something to do with account_id's nature (an array within a subdocument within an array). 200ms seems about right for an nscanned that high.
A more performant document organization might be to denormalize the account_id -> bank_id relationship within the subdocument, and store:
{"bank_accounts": [
{
"bank_id": X,
"account_id: Y,
},
{
"bank_id": X,
"account_id: Z,
}
]}
instead of: {"bank_accounts": [{ "bank_id": X, "account_id: [Y, Z], }]}
My tests below show that with this organization, the query optimizer gets back to work and exerts a range on both keys:
> db.accounts.insert({"something": true, "blah": [{ a: "1", b: "2"} ] })
> db.accounts.ensureIndex({"blah.a": 1, "blah.b": 1})
> db.accounts.find({"blah.a": 1, "blah.b": "A RANGE"}).explain()
{
"cursor" : "BtreeCursor blah.a_1_blah.b_1",
"isMultiKey" : false,
"n" : 0,
"nscannedObjects" : 0,
"nscanned" : 0,
"nscannedObjectsAllPlans" : 0,
"nscannedAllPlans" : 0,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 0,
"indexBounds" : {
"blah.a" : [
[
1,
1
]
],
"blah.b" : [
[
"A RANGE",
"A RANGE"
]
]
}
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With