Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Mongodb simple prefix query with regex and sort is slow

I'm stuck with this simple prefix query. Although Mongo docs state that you can get pretty good performance by using the prefix regex format (/^a/), the query is pretty slow when I try to sort the results:

940 millis

db.posts.find({hashtags: /^noticias/ }).limit(15).sort({rank : -1}).hint('hashtags_1_rank_-1').explain()

{
"cursor" : "BtreeCursor hashtags_1_rank_-1 multi",
"isMultiKey" : true,
"n" : 15,
"nscannedObjects" : 142691,
"nscanned" : 142692,
"nscannedObjectsAllPlans" : 142691,
"nscannedAllPlans" : 142692,
"scanAndOrder" : true,
"indexOnly" : false,
"nYields" : 1,
"nChunkSkips" : 0,
"millis" : 934,
"indexBounds" : {
    "hashtags" : [
        [
            "noticias",
            "noticiat"
        ],
        [
            /^noticias/,
            /^noticias/
        ]
    ],
    "rank" : [
        [
            {
                "$maxElement" : 1
            },
            {
                "$minElement" : 1
            }
        ]
    ]
},
"server" : "XRTZ048.local:27017"
}

However, the unsorted version of the same query is super fast:

0 millis

db.posts.find({hashtags: /^noticias/ }).limit(15).hint('hashtags_1_rank_-1').explain()

{
"cursor" : "BtreeCursor hashtags_1_rank_-1 multi",
"isMultiKey" : true,
"n" : 15,
"nscannedObjects" : 15,
"nscanned" : 15,
"nscannedObjectsAllPlans" : 15,
"nscannedAllPlans" : 15,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 0,
"indexBounds" : {
    "hashtags" : [
        [
            "noticias",
            "noticiat"
        ],
        [
            /^noticias/,
            /^noticias/
        ]
    ],
    "rank" : [
        [
            {
                "$maxElement" : 1
            },
            {
                "$minElement" : 1
            }
        ]
    ]
},
"server" : "XRTZ048.local:27017"

}

The query is also fast if I remove the regex and sort:

0 millis

db.posts.find({hashtags: 'noticias' }).limit(15).sort({rank : -1}).hint('hashtags_1_rank_-1').explain()

{
"cursor" : "BtreeCursor hashtags_1_rank_-1",
"isMultiKey" : true,
"n" : 15,
"nscannedObjects" : 15,
"nscanned" : 15,
"nscannedObjectsAllPlans" : 15,
"nscannedAllPlans" : 15,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 0,
"indexBounds" : {
    "hashtags" : [
        [
            "noticias",
            "noticias"
        ]
    ],
    "rank" : [
        [
            {
                "$maxElement" : 1
            },
            {
                "$minElement" : 1
            }
        ]
    ]
},
"server" : "XRTZ048.local:27017"

}

It seems like using both regex and sort makes Mongo scan lots of records. However, sort is scanning just 15 if I don't use the regex. What's wrong here?

like image 245
jaime Avatar asked Oct 23 '12 15:10

jaime


1 Answers

The scanAndOrder: true in the explain output indicates that the query is having to retrieve the documents and then sort them in memory before the output is returned. This is an expensive operation, and will be having an impact on the performance of your query.

The existence of scanAndOrder: true as well as the difference in nscanned an n in the explain output indicates that the query is not using an optimal index. In this case it appears to be needing to do a collection scan. You might be able to alleviate this issue by including the index keys in your sort criteria. From my testing:

db.posts.find({hashtags: /^noticias/ }).limit(15).sort({hashtags:1, rank : -1}).explain()

Does not require a scan and order, and returns n and nscanned of the number of records you are looking for. This would also mean sorting on the hashtags key, which may or may not be useful to you, but should increase the performance of the query.

like image 137
Andre de Frere Avatar answered Nov 15 '22 17:11

Andre de Frere