Logo Questions Linux Laravel Mysql Ubuntu Git Menu

Slow range query on a multikey index

I have a MongoDB collection named post with 35 million objects. The collection has two secondary indexes defined as follows.

> db.post.getIndexKeys()
        "_id" : 1
        "namespace" : 1,
        "domain" : 1,
        "post_id" : 1
        "namespace" : 1,
        "post_time" : 1,
        "tags" : 1  // this is an array field

I expect the following query, which simply filters by namespace and post_time, to run in a reasonable time without scanning all objects.

>db.post.find({post_time: {"$gte" : ISODate("2013-04-09T00:00:00Z"), "$lt" : ISODate("2013-04-09T01:00:00Z")}, namespace: "my_namespace"}).count()

However, it takes MongoDB at least ten minutes to retrieve the result and, curiously, it manages to scan 70 million objects to do the job according to the explain function.

> db.post.find({post_time: {"$gte" : ISODate("2013-04-09T00:00:00Z"), "$lt" : ISODate("2013-04-09T01:00:00Z")}, namespace: "my_namespace"}).explain()
    "cursor" : "BtreeCursor namespace_1_post_time_1_tags_1",
    "isMultiKey" : true,
    "n" : 7408,
    "nscannedObjects" : 69999186,
    "nscanned" : 69999186,
    "nscannedObjectsAllPlans" : 69999186,
    "nscannedAllPlans" : 69999186,
    "scanAndOrder" : false,
    "indexOnly" : false,
    "nYields" : 378967,
    "nChunkSkips" : 0,
    "millis" : 290048,
    "indexBounds" : {
        "namespace" : [
        "post_time" : [
        "tags" : [
                    "$minElement" : 1
                    "$maxElement" : 1
    "server" : "localhost:27017"

The difference between the number of objects and the number of scans must be caused by the lengths of the tag arrays (which are all equal to 2). Still, I don't understand why post_time filter does not make use of the index.

Can you tell me what I might be missing?

(I am working on a descent machine with 24 cores and 96 GB RAM. I am using MongoDB 2.2.3.)

like image 800
Eser Aygün Avatar asked May 09 '13 11:05

Eser Aygün

1 Answers

Found my answer in this question: Order of $lt and $gt in MongoDB range query

My index is a multikey index (on tags) and I am running a range query (on post_time). Apparently, MongoDB cannot use both sides of the range as a filter in this case, so it just picks the $gte clause, which comes first. As my lower limit happens to be the lowest post_time value, MongoDB starts scanning all the objects.

Unfortunately, this is not the whole story. Trying to solve the problem, I created non-multikey indexes too but MongoDB insisted on using the bad one. That made me think that the problem was elsewhere. Finally, I had to drop the multikey index and create one without the tags field. Everything is fine now.

like image 68
Eser Aygün Avatar answered Sep 21 '22 10:09

Eser Aygün