Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MongoDB Multikey Compound Index - Need Help Understanding Bounds

Tags:

mongodb

We've recently decided to revisit some of our MongoDB indexes and came across a peculiar result when using a compound index which contains a multikey part.

It's important to note that we're using v2.4.5

TLDR: When using a compound index with multikey part, the bounds of a non-multikey field used for range restriction are dropped.

I'll explain the problem with an example:

Create some data

db.demo.insert(
[{ "foo" : 1, "attr" : [  {  "name" : "a" },  {  "name" : "b" },  {  "name" : "c" } ]},
 { "foo" : 2, "attr" : [  {  "name" : "b" },  {  "name" : "c" },  {  "name" : "d" } ]},
 { "foo" : 3, "attr" : [  {  "name" : "c" },  {  "name" : "d" },  {  "name" : "e" } ]},
 { "foo" : 4, "attr" : [  {  "name" : "d" },  {  "name" : "e" },  {  "name" : "f" } ]}])

Index

db.demo.ensureIndex({'attr.name': 1, 'foo': 1})

Query & Explain

Query on 'attr.name' but constrain the range of the non-multikey field 'foo':

db.demo.find({foo: {$lt:3, $gt: 1}, 'attr.name': 'c'}).hint('attr.name_1_foo_1').explain()
{
    "cursor" : "BtreeCursor attr.name_1_foo_1",
    "isMultiKey" : true,
    "n" : 1,
    "nscannedObjects" : 2,
    "nscanned" : 2,
    "nscannedObjectsAllPlans" : 2,
    "nscannedAllPlans" : 2,
    "scanAndOrder" : false,
    "indexOnly" : false,
    "nYields" : 0,
    "nChunkSkips" : 0,
    "millis" : 0,
    "indexBounds" : {
        "attr.name" : [
            [
                "c",
                "c"
            ]
        ],
        "foo" : [
            [
                -1.7976931348623157e+308,
                3
            ]
        ]
    }
}

As you can see, the range of 'foo' is not as defined in the query, one end is completely ignored which results in nscanned being larger than it should.

Changing the order of the range operands will alter the dropped end:

db.demo.find({foo: {$gt: 1, $lt:3}, 'attr.name': 'c'}).hint('attr.name_1_foo_1').explain()
{
    "cursor" : "BtreeCursor attr.name_1_foo_1",
    "isMultiKey" : true,
    "n" : 1,
    "nscannedObjects" : 2,
    "nscanned" : 2,
    "nscannedObjectsAllPlans" : 2,
    "nscannedAllPlans" : 2,
    "scanAndOrder" : false,
    "indexOnly" : false,
    "nYields" : 0,
    "nChunkSkips" : 0,
    "millis" : 0,
    "indexBounds" : {
        "attr.name" : [
            [
                "c",
                "c"
            ]
        ],
        "foo" : [
            [
                1,
                1.7976931348623157e+308
            ]
        ]
    }
}

We're either missing out on some multikey index basics, or we're facing a bug.

We've gone through similar topics, including:

  • https://groups.google.com/forum/#!searchin/mongodb-user/multikey$20bounds/mongodb-user/RKrsyzRwHrE/_i0SxdJV5qcJ
  • Order of $lt and $gt in MongoDB range query

Unfortunately these posts address a different use-case where a range is set on the multikeyed value.

Other things we've tried to do:

  • Change the compound index ordering, starting with the non-multikey field.

  • Put the 'foo' value inside each of the subdocuments in the 'attr' array, index by ('attr.name', 'attr.foo') and do an $elemMatch on 'attr' with a range constraint on 'foo'.

  • Use an $and operator when defining the range:

    db.demo.find({'attr.name': 'c', $and: [{num: {$lt: 3}}, {num: {$gt: 1}}]})
    
  • Use MongoDB v2.5.4

None of the above had any effect (v2.5.4 made things worse by dumping both ends of the range completely).

Any kind of help would be highly appreciated!

Many Thanks,

Roi

like image 586
Roi Tal Avatar asked Dec 24 '13 15:12

Roi Tal


People also ask

Does MongoDB support Multikey indexes?

MongoDB can use the multikey index to find documents that have 5 at any position in the ratings array.

Does MongoDB support compound indexes?

In MongoDB, we can create compound index using createIndex() method. Syntax: db. collection.

What is Multikey index in MongoDB?

MongoDB allows to index a field that holds an array value by creating an index key for each element in the array, such type of indexing is called Multikey indexes. It supports efficient queries against array fields.


2 Answers

With compound indexes where one of the indexed fields is an array, MongoDB will only use either a lower or upper bound for the range query to ensure correct matches are returned. See SERVER-958 for an example where constraining to both upper and lower index bounds would not find the expected document.

If your range query is on the array field you can potentially use the $elemMatch operator to optimise your query within the expected index bounds. As at MongoDB 2.4, the $elemMatch operator does not work on non-array fields so unfortunately this doesn't help your use case. You can watch/upvote SERVER-6050: Consider allowing $elemMatch applied to non arrays in the MongoDB issue tracker.

There is also an open issue SERVER-7959: Potentially unexpected scans with compound indexes when some fields are multikey describing this behaviour.

like image 169
Stennie Avatar answered Nov 05 '22 05:11

Stennie


The $min and $max operators may help to work around this issue, by allowing you to explicitly specify the index bounds.

Example:

db.demo.find({foo: {$lt:3, $gt: 1}, 'attr.name': 'c'}).
 hint('attr.name_1_foo_1').
 min({'attr.name': 'c', foo: 1.000001}).
 max({'attr.name': 'c', foo: 3}).explain()

Result:

{
    "cursor" : "BtreeCursor attr.name_1_foo_1",
    "isMultiKey" : true,
    "n" : 1,
    "nscannedObjects" : 1,
    "nscanned" : 1,
    "nscannedObjectsAllPlans" : 1,
    "nscannedAllPlans" : 1,
    "scanAndOrder" : false,
    "indexOnly" : false,
    "nYields" : 0,
    "nChunkSkips" : 0,
    "millis" : 0,
    "indexBounds" : {
        "start" : {
            "attr.name" : "c",
            "foo" : 1.000001
        },
        "end" : {
            "attr.name" : "c",
            "foo" : 3
        }
    }
}

There are some important caveats though:

  1. $min is always inclusive (like $gte), and $max is always exclusive (like $lt). You may have to adjust your values to get the effect of $gt or $lte.
  2. The fields in $min and $max must exactly match the fields in the index.
  3. You can only have one set of index bounds per query. There is no equivalent for an $in or $or query.
  4. While the operators are documented, it does not seem to be recommended for normal use cases.

Point 3 is a blocker for me (I need to do an $in on the array field), so I'm still looking for another solution.

Source: https://groups.google.com/forum/#!msg/mongodb-user/oxL8wuVdITA/uWJHVbMd_-8J

like image 32
Ralf Avatar answered Nov 05 '22 05:11

Ralf