Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Projection makes query slower

I have over 600k of record in MongoDb. my user Schema looks like this:

{
    "_id" : ObjectId,
    "password" : String,
    "email" : String,
    "location" : Object,
    "followers" : Array,
    "following" : Array,
    "dateCreated" : Number,
    "loginCount" : Number,
    "settings" : Object,
    "roles" : Array,
    "enabled" : Boolean,
    "name" : Object
}

following query:

db.users.find(
     {},
     { 
         name:1, 
         settings:1,
         email:1,
         location:1
     }
).skip(656784).limit(10).explain()

results into this:

{
    "cursor" : "BasicCursor",
    "isMultiKey" : false,
    "n" : 10,
    "nscannedObjects" : 656794,
    "nscanned" : 656794,
    "nscannedObjectsAllPlans" : 656794,
    "nscannedAllPlans" : 656794,
    "scanAndOrder" : false,
    "indexOnly" : false,
    "nYields" : 5131,
    "nChunkSkips" : 0,
    "millis" : 1106,
    "server" : "shreyance:27017",
    "filterSet" : false
}

and after removing projection same query db.users.find().skip(656784).limit(10).explain()

results into this:

{
    "cursor" : "BasicCursor",
    "isMultiKey" : false,
    "n" : 10,
    "nscannedObjects" : 656794,
    "nscanned" : 656794,
    "nscannedObjectsAllPlans" : 656794,
    "nscannedAllPlans" : 656794,
    "scanAndOrder" : false,
    "indexOnly" : false,
    "nYields" : 5131,
    "nChunkSkips" : 0,
    "millis" : 209,
    "server" : "shreyance:27017",
    "filterSet" : false
}

As far I know projection always increase performance of a query. So I am unable to understand why MongoDB is behaving like this. Can someone explain this. And when to use projection and when not. And how actually projection is implemented in MongoDB.

like image 783
Shreyance Jain Avatar asked Jul 22 '14 11:07

Shreyance Jain


People also ask

What does Projection do in MongoDB?

In MongoDB, projection means selecting only the necessary data rather than selecting whole of the data of a document. If a document has 5 fields and you need to show only 3, then select only 3 fields from them.

What method can we use to maximize performance and prevent MongoDB from returning more results than required for processing?

Use limit() to maximize performance and prevent MongoDB from returning more results than required for processing.


2 Answers

You are correct that projection makes this skip query slower in MongoDB 2.6.3. This is related to an optimisation issue with the 2.6 query planner tracked as SERVER-13946.

The 2.6 query planner (as at 2.6.3) is adding SKIP (and LIMIT) stages after projection analysis, so the projection is being unnecessarily applied to results that get thrown out during the skip for this query. I tested a similar query in MongoDB 2.4.10 and the nScannedObjects was equal to the number of results returned by my limit rather than skip + limit.

There are several factors contributing to your query performance:

1) You haven't specified any query criteria ({}), so this query is doing a collection scan in natural order rather than using an index.

2) The query cannot be covered because there is no projection.

3) You have an extremely large skip value of 656,784.

There is definitely room for improvement on the query plan, but I wouldn't expect skip values of this magnitude to be reasonable in normal usage. For example, if this was an application query for pagination with 50 results per page your skip() value would be the equivalent of page number 13,135.

like image 188
Stennie Avatar answered Nov 15 '22 15:11

Stennie


Unless the result of your projection does something to produce an "index only" query, and that means only the the fields "projected" in the result are all present in the index only, then you are always producing more work for the query engine.

You have to consider the process:

  1. How do I match? On document or index? Find appropriate primary or other index.

  2. Given the index, scan and find things.

  3. Now what do I have to return? Is all of the data in the index? If not go back to the collection and pull the documents.

That is the basic process. So unless one of those stages "optimizes" in any way then of course things "take longer".

You need to look at this as designing a "server engine" and understand the steps that need to be undertaken. Considering none of your conditions met anything that would produce "optimal" on the specified steps you need to learn to accept that.

Your "best" case, is wher only the projected fields are the fields present in the chosen index. But really, even that has the overhead of loading the index.

So choose wisely, and understand the constraints and memory requirements for what you are writing our query for. That is what "optimization" is all about.

like image 35
Neil Lunn Avatar answered Nov 15 '22 17:11

Neil Lunn