Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Understanding an index on an array of subdocuments

I've been looking into array (multi-key) indexing on MongoDB and I have the following questions that I haven't been able to find much documentation on:

Indexes on an array of subdocuments

So if I have an array field that looks something like:

{field : [
  {a : "1"}, 
  {b : "2"}, 
  {c : "3"}
  ]
}

I am querying only on field.a and field.c individually (not both together), I believe I have a choice between the following alternatives:

  1. db.Collection.ensureIndex({field : 1});
  2. db.Collection.ensureIndex({field.a : 1}); db.Collection.ensureIndex({field.c : 1});

That is: an index on the entire array; or two indexes on the embedded fields. Now my questions are:

  • How do you visualize an index on the entire array in option 1 (is it even useful)? What queries is such an index useful for?
  • Given the querying situation I have described, which of the above two options is better, and why?
like image 344
Zaid Masud Avatar asked Nov 27 '12 11:11

Zaid Masud


1 Answers

You are correct that if you are querying only on the value of a in the field array, both indexes will, in a sense, help you make your query more performant.

However, have a look at the following 3 queries:

> db.zaid.save({field : [{a: 1}, {b: 2}, {c: 3}] });
> db.zaid.ensureIndex({field:1});
> db.zaid.ensureIndex({"field.a":1});

#Query 1
> db.zaid.find({"field.a":1})
{ "_id" : ObjectId("50b4be3403634cff61158dd0"), "field" : [ { "a" : 1 }, { "b" : 2 }, { "c" : 3 } ] }
> db.zaid.find({"field.a":1}).explain();
{
    "cursor" : "BtreeCursor field.a_1",
    "nscanned" : 1,
    "nscannedObjects" : 1,
    "n" : 1,
    "millis" : 0,
    "nYields" : 0,
    "nChunkSkips" : 0,
    "isMultiKey" : true,
    "indexOnly" : false,
    "indexBounds" : {
        "field.a" : [
            [
                1,
                1
            ]
        ]
    }
}

#Query 2
> db.zaid.find({"field.b":1}).explain();
{
    "cursor" : "BasicCursor",
    "nscanned" : 1,
    "nscannedObjects" : 1,
    "n" : 0,
    "millis" : 0,
    "nYields" : 0,
    "nChunkSkips" : 0,
    "isMultiKey" : false,
    "indexOnly" : false,
    "indexBounds" : {

    }
}

#Query 3
> db.zaid.find({"field":{b:1}}).explain();
{
    "cursor" : "BtreeCursor field_1",
    "nscanned" : 0,
    "nscannedObjects" : 0,
    "n" : 0,
    "millis" : 0,
    "nYields" : 0,
    "nChunkSkips" : 0,
    "isMultiKey" : true,
    "indexOnly" : false,
    "indexBounds" : {
        "field" : [
            [
                {
                    "b" : 1
                },
                {
                    "b" : 1
                }
            ]
        ]
    }
}

Notice that the second query doesn't have an index on it, even though you indexed the array, but the third query does. Choosing your indexes based on how you intend to query your data is as important as considering whether the index itself is what you need. In Mongo, the structure of your index can and does make a very large difference on the performance of your queries if you aren't careful. I think that explains your first question.

Your second question is a bit more open ended, but I think the answer, again, lies in how you expect to query your data. If you will only ever be interested in matching on values of "fields.a", then you should save room in memory for other indexes which you might need down the road. If, however, you are equally likely to query on any of those items in the array, and you are reasonably certain that the array will no grow infinitely (never index on an array that will potentially grow over time to an unbound size. The index will be unable to index documents once the array reaches 1024 bytes in BSON.), then you should index the full array. An example of this might be a document for a hand of playing cards which contains an array describing each card in a users hand. You can index on this array without fear of overflowing beyond the index size boundary since a hand could never have more than 52 cards.

like image 181
rdrkt Avatar answered Oct 19 '22 23:10

rdrkt