Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Expected Behaviour of Compound _id in MongoDB?

I have a compound _id containing 3 numeric properties:

_id": { "KeyA": 0, "KeyB": 0, "KeyC": 0 }

The database in question has 2 million identical values for KeyA and clusters of 500k identical values for KeyB.

My understanding is that I can efficiently query for KeyA and KeyB using the command:

find( { "_id.KeyA" : 1, "_id.KeyB": 3 } ).limit(100)

When I explain this query the result is:

"cursor" : "BasicCursor",
"nscanned" : 1000100,
"nscannedObjects" : 1000100,
"n" : 100,
"millis" : 1592,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : false,
"indexOnly" : false,
"indexBounds" : {}

Without the limit() the result is:

"cursor" : "BasicCursor",
"nscanned" : 2000000,
"nscannedObjects" : 2000000,
"n" : 500000,
"millis" : 3181,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : false,
"indexOnly" : false,
"indexBounds" : {}

As I understand it BasicCursor means that index has been ignored and both queries have a high execution time - even when I've only requested 100 records it takes ~1.5 seconds. It was my intention to use the limit to implement pagination but this is obviously too slow.

The command:

find( { "_id.KeyA" : 1, "_id.KeyB": 3, , "_id.KeyC": 1000 } )

Correctly uses the BtreeCursor and executes quickly suggesting the compound _id is correct.

I'm using the release 1.8.3 of MongoDb. Could someone clarify if I'm seeing the expected behaviour or have I misunderstood how to use/query the compound index?

Thanks, Paul.

like image 472
Paul Avatar asked Dec 01 '22 01:12

Paul


2 Answers

The index is not a compound index, but an index on the whole value of the _id field. MongoDB does not look into an indexed field, and instead uses the raw BSON representation of a field to make comparisons (if I read the docs correctly).

To do what you want you need an actual compound index over {_id.KeyA: 1, _id.KeyB: 1, _id.KeyC: 1} (which also should be a unique index). Since you can't not have an index on _id you will probably be better off leaving it as ObjectId (that will create a smaller index and waste less space) and keep your KeyA, KeyB and KeyC fields as properties of your document. E.g. {_id: ObjectId("xyz..."), KeyA: 1, KeyB: 2, KeyB: 3}

like image 155
Theo Avatar answered Dec 06 '22 10:12

Theo


You would need a separate compound index for the behavior you desire. In general I recommend against using objects as _id because key order is significant in comparisons, so {a:1, b:1} does not equal {b:1, a:1}. Since not all drivers preserve key order in objects it is very easy to shoot yourself in the foot by doing something like this:

db.foo.save(db.foo.findOne())
like image 23
mstearn Avatar answered Dec 06 '22 10:12

mstearn