Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What kind of data structure is used for a compound index?

Tags:

mongodb

From the MongoDB docs: MongoDB indexes use a B-tree data structure.

But, does it also apply to the compound indexes? In affirmative case, how is it actually implemented?

PD: The only way I imagine it is as a B-tree in which each node has not a single value, but as many values as indexes stored, for example, in an array (i.e. as if two binary trees (or more, one for each index) had been merged).

like image 540
Franzech Domâs Avatar asked Oct 29 '22 23:10

Franzech Domâs


1 Answers

I cannot say anything with 100% certainty on the actual implementation, but I do think of compound indices as implemented as B-trees as well, with, as you describe, nodes that have sequences of values (respecting the index key order), and, very importantly using a lexicographic order on the key values.

Having said that, in order to save some space, I would also consider a B-tree that only uses the values associated with the first key on the top of the tree, then the values of the second key a bit further down, ... and the value of the last key close to the leaves. This is nothing else than an optimization that makes the boundaries of the nodes coincide with the individual keys.

The advantage of implementing a compound index this way (with or without the said optimization) is that, if you have an index on several keys, let's say A then B then C, then you get an index on A alone for free, and an index on A then B for free: indeed, all values for any prefix of the sequence of keys are always grouped together in such a B-tree, because of the lexicographic order.

Since MongoDB documents that such is the case, I think of the implementation of compound indices this way when I use MongoDB.

Furthermore, the documentation specifies that hashed index fields are forbidden on compound indices. This is one more clue, as B-trees implement range indices.

Also, I would expect MongoDB's hash indices to be implemented with a hash table rather than B-trees, as it would be less efficient to use a B-tree for point queries only (logarithmic lookup vs. O(1))

like image 89
Ghislain Fourny Avatar answered Nov 08 '22 03:11

Ghislain Fourny