We've recently decided to revisit some of our MongoDB indexes and came across a peculiar result when using a compound index which contains a multikey part.
It's important to note that we're using v2.4.5
TLDR: When using a compound index with multikey part, the bounds of a non-multikey field used for range restriction are dropped.
I'll explain the problem with an example:
Create some data
db.demo.insert(
[{ "foo" : 1, "attr" : [ { "name" : "a" }, { "name" : "b" }, { "name" : "c" } ]},
{ "foo" : 2, "attr" : [ { "name" : "b" }, { "name" : "c" }, { "name" : "d" } ]},
{ "foo" : 3, "attr" : [ { "name" : "c" }, { "name" : "d" }, { "name" : "e" } ]},
{ "foo" : 4, "attr" : [ { "name" : "d" }, { "name" : "e" }, { "name" : "f" } ]}])
Index
db.demo.ensureIndex({'attr.name': 1, 'foo': 1})
Query & Explain
Query on 'attr.name' but constrain the range of the non-multikey field 'foo':
db.demo.find({foo: {$lt:3, $gt: 1}, 'attr.name': 'c'}).hint('attr.name_1_foo_1').explain()
{
"cursor" : "BtreeCursor attr.name_1_foo_1",
"isMultiKey" : true,
"n" : 1,
"nscannedObjects" : 2,
"nscanned" : 2,
"nscannedObjectsAllPlans" : 2,
"nscannedAllPlans" : 2,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 0,
"indexBounds" : {
"attr.name" : [
[
"c",
"c"
]
],
"foo" : [
[
-1.7976931348623157e+308,
3
]
]
}
}
As you can see, the range of 'foo' is not as defined in the query, one end is completely ignored which results in nscanned being larger than it should.
Changing the order of the range operands will alter the dropped end:
db.demo.find({foo: {$gt: 1, $lt:3}, 'attr.name': 'c'}).hint('attr.name_1_foo_1').explain()
{
"cursor" : "BtreeCursor attr.name_1_foo_1",
"isMultiKey" : true,
"n" : 1,
"nscannedObjects" : 2,
"nscanned" : 2,
"nscannedObjectsAllPlans" : 2,
"nscannedAllPlans" : 2,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 0,
"indexBounds" : {
"attr.name" : [
[
"c",
"c"
]
],
"foo" : [
[
1,
1.7976931348623157e+308
]
]
}
}
We're either missing out on some multikey index basics, or we're facing a bug.
We've gone through similar topics, including:
Unfortunately these posts address a different use-case where a range is set on the multikeyed value.
Other things we've tried to do:
Change the compound index ordering, starting with the non-multikey field.
Put the 'foo' value inside each of the subdocuments in the 'attr' array, index by ('attr.name', 'attr.foo') and do an $elemMatch on 'attr' with a range constraint on 'foo'.
Use an $and operator when defining the range:
db.demo.find({'attr.name': 'c', $and: [{num: {$lt: 3}}, {num: {$gt: 1}}]})
Use MongoDB v2.5.4
None of the above had any effect (v2.5.4 made things worse by dumping both ends of the range completely).
Any kind of help would be highly appreciated!
Many Thanks,
Roi
MongoDB can use the multikey index to find documents that have 5 at any position in the ratings array.
In MongoDB, we can create compound index using createIndex() method. Syntax: db. collection.
MongoDB allows to index a field that holds an array value by creating an index key for each element in the array, such type of indexing is called Multikey indexes. It supports efficient queries against array fields.
With compound indexes where one of the indexed fields is an array, MongoDB will only use either a lower or upper bound for the range query to ensure correct matches are returned. See SERVER-958 for an example where constraining to both upper and lower index bounds would not find the expected document.
If your range query is on the array field you can potentially use the $elemMatch
operator to optimise your query within the expected index bounds. As at MongoDB 2.4, the $elemMatch
operator does not work on non-array fields so unfortunately this doesn't help your use case. You can watch/upvote SERVER-6050: Consider allowing $elemMatch applied to non arrays in the MongoDB issue tracker.
There is also an open issue SERVER-7959: Potentially unexpected scans with compound indexes when some fields are multikey describing this behaviour.
The $min and $max operators may help to work around this issue, by allowing you to explicitly specify the index bounds.
Example:
db.demo.find({foo: {$lt:3, $gt: 1}, 'attr.name': 'c'}).
hint('attr.name_1_foo_1').
min({'attr.name': 'c', foo: 1.000001}).
max({'attr.name': 'c', foo: 3}).explain()
Result:
{
"cursor" : "BtreeCursor attr.name_1_foo_1",
"isMultiKey" : true,
"n" : 1,
"nscannedObjects" : 1,
"nscanned" : 1,
"nscannedObjectsAllPlans" : 1,
"nscannedAllPlans" : 1,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 0,
"indexBounds" : {
"start" : {
"attr.name" : "c",
"foo" : 1.000001
},
"end" : {
"attr.name" : "c",
"foo" : 3
}
}
}
There are some important caveats though:
Point 3 is a blocker for me (I need to do an $in on the array field), so I'm still looking for another solution.
Source: https://groups.google.com/forum/#!msg/mongodb-user/oxL8wuVdITA/uWJHVbMd_-8J
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With