Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Indexing MongoDB for quicker find() with sort(), on different fields

I'm running lots of queries of such type:

db.mycollection.find({a:{$gt:10,$lt:100}, b:4}).sort({c:-1, a:-1})

What sort of index should I use to speed it up? I think I'll need to have both {a:1, b:1} and {c:-1, a:-1}, am I right? Or these indexes will somehow interfere with each other at no performance gain?

EDIT: The actual problem for me is that I run many queries in a loop, some of them over small range, others over large range. If I put index on {a:1, b:1}, it selects small chunks very quickly, but when it comes to large range I see an error "too much data for sort() with no index". If, otherwise, I put index on {c:-1, a:-1}, there is no error, but the smaller chunks (and there are more of those) are processed much slower. So, how is it possible to keep the quickness of selection for smaller ranges, but not get error on large amount of data?

If it matters, I run queries through Python's pymongo.

like image 770
sashkello Avatar asked Sep 25 '13 07:09

sashkello


3 Answers

If you had read the documentation you would have seen that using two indexes here would have been useless since MongoDB only uses one index per query (unless it is an $or) until: https://jira.mongodb.org/browse/SERVER-3071 is implemented.

Not only that but also when using a compound sort the order in the index must match the sort order for a index to be used correctly, as such:

Or these indexes will somehow interfere with each other at no performance gain?

If intersectioning were implemented no they would not, {a:1,b:1} does not match the sort and {c:-1,a:-1} is sub-optimal for answering the find() plus a is not a prefix of that compound.

So immediately an iteration of a optimal index would be:

{a:-1,b:1,c:-1}

But this isn't the full story. Since $gt and $lt are actually ranges, like $in they suffer the same problem with indexes, this article should provide the answer: http://blog.mongolab.com/2012/06/cardinal-ins/ don't really see any reason to repeat its content.

like image 52
Sammaye Avatar answered Oct 17 '22 00:10

Sammaye


Disclaimer: For MongoDB v2.4

Using hint is a nice solution, since it will force the query to use indexes that you chose, so you can optimize the query with different indexes until you are satisfied. The downside is that you are setting your own index per request. I prefer to set the indexes on the entire collection and let Mongo choose the correct (fastest) index for me, especially for queries that are used repeatedly.

You have two problems in your query:

  • Never sort on params that are not indexed. You will get this error: "too much data for sort() with no index" if the amount of documents in your .find() are very big, the size depends on the version of mongo that you use. This means that you must have indexes on A and C in order for your query to work.
  • Now for the bigger problem. You are performing a range query ($lt and $gt on param A), which can't work with Mongo. MongoDB only uses one index at a time, you are using two indexes on the same parameter. There are several solutions to deal with it in your code:

    1. r = range( 11,100 )
      db.mycollection.find({a:{$in: r }, b:4}).sort({c:-1, a:-1})

    2. Use only $lt or $gt in your query,
      db.mycollection.find({ a: { $lt:100 }, b:4}).sort({c:-1, a:-1})
      Get the results and filter them in your python code. This solution will return more data, so if you have millions of results with that are less then A=11, don't use it!
      If you choose this option, make sure you use a compound key with A and B.

Pay attention when using $or in your queries, since $or is less efficiently optimized than $in with it's usage of indexes.

like image 21
Oran Avatar answered Oct 17 '22 01:10

Oran


If you define an index {c:-1,a:-1,b:1} it will help with some considerations.

With this option the index fully will be scanned, but based on the index values only the apropriate documents will be visited, and they will be visited in the right order so the ordering phase will not be needed after getting the results. If the index is huge i do not know how it will behave, but i assume when the result would be small it will be slower in case of the resultset is big it will be faster.

About prefix matching. If you hint the index and lower levels are useable to serve the query those levels will be used for. To demonstrate this behaviour i made a short test.

I prepared test data with:

> db.createCollection('testIndex')
{ "ok" : 1 }
> db.testIndex.ensureIndex({a:1,b:1})
> db.testIndex.ensureIndex({c:-1,a:-1})
> db.testIndex.ensureIndex({c:-1,a:-1,b:1})
> for(var i=1;i++<500;){db.testIndex.insert({a:i,b:4,c:i+5});}
> for(var i=1;i++<500;){db.testIndex.insert({a:i,b:6,c:i+5});}

te result of the query with hint:

> db.testIndex.find({a:{$gt:10,$lt:100}, b:4}).hint('c_-1_a_-1_b_1').sort({c:-1, a:-1}).explain()
{
    "cursor" : "BtreeCursor c_-1_a_-1_b_1",
    "isMultiKey" : false,
    "n" : 89,
    "nscannedObjects" : 89,
    "nscanned" : 588,
    "nscannedObjectsAllPlans" : 89,
    "nscannedAllPlans" : 588,
    "scanAndOrder" : false,
    "indexOnly" : false,
    "nYields" : 0,
    "nChunkSkips" : 0,
    "millis" : 1,
    "indexBounds" : {
        "c" : [
            [
                {
                    "$maxElement" : 1
                },
                {
                    "$minElement" : 1
                }
            ]
        ],
        "a" : [
            [
                100,
                10
            ]
        ],
        "b" : [
            [
                4,
                4
            ]
        ]
    },
    "server" :""
}

Explanation of the output is the index is scanned that is why nscanned is 588 (number of scanned index entries and documents), the number at nscannedObjects is the number of the scanned documents. So based on the index mongo only reads those documents which match the criteria (the index partially covering or so). as you can see scanAndOrder is false so there is no sorting phase. (that implicates if the index is in memory that will be fast)

Along with the article what others linked : http://blog.mongolab.com/wp-content/uploads/2012/06/IndexVisitation-4.png you have to put first the sort keys in the index and the query keys after, if they have a subset match you have to include the subset in the very same order as they in the sorting criteria (while it does not matter for the query part).

like image 2
attish Avatar answered Oct 17 '22 01:10

attish