Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MongoDB query using embedded document as key

Tags:

mongodb

Short version:

If I have an index {"category": 1}, and a document {"category": {type: "memory", class: "DDR400"}, how can I do a query such as {"category.type": "memory"} that uses my index?

Long version:

With MongoDB, I want to use an embedded document as a key for an index.

For example, I might have some documents such as this (for a hypothetical product database):

{"category": {"type": "hard-drive", "form_factor": "2.5in", "size": "500GB"}, ...}
{"category": {"type": "hard-drive", "form_factor": "3.5in", ...}, ...}
{"category": {"type": "memory", "class": "DDR400", ...}, ...}

For the above examples, I might want to do queries such as:

{"category.type": "hard-drive"}
{"category.type": "hard-drive", "category.form_factor": "2.5in"}
{"category.type": "memory"}
{"category.type": "memory", "category.class": "DDR400"}

My issues is creating an index. The document at http://www.mongodb.org/display/DOCS/Indexes#Indexes-DocumentsasKeys describes two options:

The first options is to create a compound index, for example { "category.type": 1, "category.class": 1 }. This does not work well for my case, as I might have many different types of sub-categories.

The second option is to use the document as the key: { "category": 1 }. Now a query such as {"category": {"type": "memory", "class": "DDR400"}} would use the index, but {"category": {"type": "memory"}} would return nothing, and {"category.type": "memory"} would not use the index. Is there a way to do a query using this index that would give the same results as {"category.type": "memory"}?

I suspect a query using something like {"category" {"$gt": ..., "$lt": ...} should work, but what should I put in the blank spaces there?

like image 682
Ralf Avatar asked Jul 07 '11 07:07

Ralf


2 Answers

Creating a separate index for category.type (probably in addition to category) seems like the best option.

You could use a range query with $gt and $lt. Those would work on the binary representation of the embedded object, which only works for the first (in storage order) field, and only if that first field is the same in all documents, so it is not very flexible, and easy to break.

   {"category"  : {"$gt": {"type": "memory"},  "$lt": {"type": "memoryX" } } }

"memoryX" here serves as a cut-off point: Everything with "memory" will sort before that.

Note that this requires that the "type" field is the first one in the binary representation for all documents that have it. It also ONLY works for the "type" field (no way to query on other fields in the first position, you have to choose one up front), thus giving you practically no advantage over a dedicated "category.type" index (just space savings).

I was experimenting with this idea before, see this thread on the mailing list. It does work, but you have to be careful what you are doing:

It is both supported and stable. Many of the sharding/replication internals use _id values that are embedded docs.

The only thing to watch out for here is the ordering of the keys in embedded element. They are sorted by their binary representation so {x:1, y:1} is different than {y:1, x:1}, and sorted differently. Not only are they sorted differently, they are different values. Some languages always sort the keys in a dictionary/hash/map by default.

Again, consider creating extra indexes on the fields that you need.

In my case I'll only need to query on 'a', 'a,b' or 'a,b,c', or on 'a,x,y', where documents containing x never contain 'b' or 'c'

That would probably work then. I'd still do two composite indexes a,b and a,x, though. Or maybe just b and x. Given that a document contains b or x, you probably already have effectively filtered out the irrelevant documents with regard to a ( form_factor = 2.5in already tells you it is a hard disk, class = DDR400 already makes it memory). And after filtering by a,b, you may not need an index to drill down further on c.

By using this tricky query on the binary representation you are making yourself dependent on what could be called an implementation detail. You may be hit by drivers that like to re-order fields, or something like this issue about Mongo itself reshuffling things sometimes.

like image 169
Thilo Avatar answered Sep 24 '22 23:09

Thilo


If there is one basic property that you are searching for for each "type", then simply also add is as a separate field, and create a compound index, eg:

{"category": {"type": "hard-drive", "form_factor": "2.5in", "searchfield: "2.5in", ...}, ...}
{"category": {"type": "memory", "class": "DDR400", searchfield: "DDR400", ...}, ...}

If there are several fields you are searching for, but the values for these fields differ, you could add the values as tags and, again, create a compound key:

{"category": {"type": "hard-drive", "form_factor": "2.5in", "size": "500GB", "tags": ["2.5in", "500GB"]}, ...}
{"category": {"type": "memory", "class": "DDR400", "tags": ["DDR400"], ...}, ...}
like image 24
Karoly Horvath Avatar answered Sep 21 '22 23:09

Karoly Horvath