Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

limit() and sort() order pymongo and mongodb

Tags:

Despite reading peoples answers stating that the sort is done first, evidence shows something different that the limit is done before the sort. Is there a way to force sort always first?

views = mongo.db.view_logging.find().sort([('count', 1)]).limit(10) 

Whether I use .sort().limit() or .limit().sort(), the limit takes precedence. I wonder if this is something to do with pymongo...

like image 518
disruptive Avatar asked Mar 27 '16 18:03

disruptive


People also ask

How do I sort a limit in MongoDB?

Sorting with the limit() methodThe sort() method can be used along with the limit() method that limits the number of results in the search query. You should pass an integer to the limit() method, which then specifies the number of documents to which the result set should be limited.

How do I sort data in MongoDB Python?

Use the sort() method to sort the result in ascending or descending order. The sort() method takes one parameter for "fieldname" and one parameter for "direction" (ascending is the default direction).

How do you sort in Pymongo?

To sort the results of a query in ascending or, descending order pymongo provides the sort() method. To this method, pass a number value representing the number of documents you need in the result.

How do you arrange data in ascending and descending order in MongoDB?

To sort documents in MongoDB, you need to use sort() method. The method accepts a document containing a list of fields along with their sorting order. To specify sorting order 1 and -1 are used. 1 is used for ascending order while -1 is used for descending order.


1 Answers

According to the documentation, regardless of which goes first in your chain of commands, sort() would be always applied before the limit().

You can also study the .explain() results of your query and look at the execution stages - you will find that the sorting input stage examines all of the filtered (in your case all documents in the collection) and then the limit is applied.


Let's go through an example.

Imagine there is a foo database with a test collection having 6 documents:

>>> col = db.foo.test >>> for doc in col.find(): ...     print(doc) {'time': '2016-03-28 12:12:00', '_id': ObjectId('56f9716ce4b05e6b92be87f2'), 'value': 90} {'time': '2016-03-28 12:13:00', '_id': ObjectId('56f971a3e4b05e6b92be87fc'), 'value': 82} {'time': '2016-03-28 12:14:00', '_id': ObjectId('56f971afe4b05e6b92be87fd'), 'value': 75} {'time': '2016-03-28 12:15:00', '_id': ObjectId('56f971b7e4b05e6b92be87ff'), 'value': 72} {'time': '2016-03-28 12:16:00', '_id': ObjectId('56f971c0e4b05e6b92be8803'), 'value': 81} {'time': '2016-03-28 12:17:00', '_id': ObjectId('56f971c8e4b05e6b92be8806'), 'value': 90} 

Now, let's execute queries with different order of sort() and limit() and check the results and the explain plan.

Sort and then limit:

>>> from pprint import pprint >>> cursor = col.find().sort([('time', 1)]).limit(3)   >>> sort_limit_plan = cursor.explain() >>> pprint(sort_limit_plan) {u'executionStats': {u'allPlansExecution': [],                      u'executionStages': {u'advanced': 3,                                           u'executionTimeMillisEstimate': 0,                                           u'inputStage': {u'advanced': 6,                                                           u'direction': u'forward',                                                           u'docsExamined': 6,                                                           u'executionTimeMillisEstimate': 0,                                                           u'filter': {u'$and': []},                                                           u'invalidates': 0,                                                           u'isEOF': 1,                                                           u'nReturned': 6,                                                           u'needFetch': 0,                                                           u'needTime': 1,                                                           u'restoreState': 0,                                                           u'saveState': 0,                                                           u'stage': u'COLLSCAN',                                                           u'works': 8},                                           u'invalidates': 0,                                           u'isEOF': 1,                                           u'limitAmount': 3,                                           u'memLimit': 33554432,                                           u'memUsage': 213,                                           u'nReturned': 3,                                           u'needFetch': 0,                                           u'needTime': 8,                                           u'restoreState': 0,                                           u'saveState': 0,                                           u'sortPattern': {u'time': 1},                                           u'stage': u'SORT',                                           u'works': 13},                      u'executionSuccess': True,                      u'executionTimeMillis': 0,                      u'nReturned': 3,                      u'totalDocsExamined': 6,                      u'totalKeysExamined': 0},  u'queryPlanner': {u'indexFilterSet': False,                    u'namespace': u'foo.test',                    u'parsedQuery': {u'$and': []},                    u'plannerVersion': 1,                    u'rejectedPlans': [],                    u'winningPlan': {u'inputStage': {u'direction': u'forward',                                                     u'filter': {u'$and': []},                                                     u'stage': u'COLLSCAN'},                                     u'limitAmount': 3,                                     u'sortPattern': {u'time': 1},                                     u'stage': u'SORT'}},  u'serverInfo': {u'gitVersion': u'6ce7cbe8c6b899552dadd907604559806aa2e9bd',                  u'host': u'h008742.mongolab.com',                  u'port': 53439,                  u'version': u'3.0.7'}} 

Limit and then sort:

>>> cursor = col.find().limit(3).sort([('time', 1)]) >>> limit_sort_plan = cursor.explain() >>> pprint(limit_sort_plan) {u'executionStats': {u'allPlansExecution': [],                      u'executionStages': {u'advanced': 3,                                           u'executionTimeMillisEstimate': 0,                                           u'inputStage': {u'advanced': 6,                                                           u'direction': u'forward',                                                           u'docsExamined': 6,                                                           u'executionTimeMillisEstimate': 0,                                                           u'filter': {u'$and': []},                                                           u'invalidates': 0,                                                           u'isEOF': 1,                                                           u'nReturned': 6,                                                           u'needFetch': 0,                                                           u'needTime': 1,                                                           u'restoreState': 0,                                                           u'saveState': 0,                                                           u'stage': u'COLLSCAN',                                                           u'works': 8},                                           u'invalidates': 0,                                           u'isEOF': 1,                                           u'limitAmount': 3,                                           u'memLimit': 33554432,                                           u'memUsage': 213,                                           u'nReturned': 3,                                           u'needFetch': 0,                                           u'needTime': 8,                                           u'restoreState': 0,                                           u'saveState': 0,                                           u'sortPattern': {u'time': 1},                                           u'stage': u'SORT',                                           u'works': 13},                      u'executionSuccess': True,                      u'executionTimeMillis': 0,                      u'nReturned': 3,                      u'totalDocsExamined': 6,                      u'totalKeysExamined': 0},  u'queryPlanner': {u'indexFilterSet': False,                    u'namespace': u'foo.test',                    u'parsedQuery': {u'$and': []},                    u'plannerVersion': 1,                    u'rejectedPlans': [],                    u'winningPlan': {u'inputStage': {u'direction': u'forward',                                                     u'filter': {u'$and': []},                                                     u'stage': u'COLLSCAN'},                                     u'limitAmount': 3,                                     u'sortPattern': {u'time': 1},                                     u'stage': u'SORT'}},  u'serverInfo': {u'gitVersion': u'6ce7cbe8c6b899552dadd907604559806aa2e9bd',                  u'host': u'h008742.mongolab.com',                  u'port': 53439,                  u'version': u'3.0.7'}} 

As you can see, in both cases the sort is applied first and affects all the 6 documents and then the limit limits the results to 3.

And, the execution plans are exactly the same:

>>> from copy import deepcopy  # just in case >>> cursor = col.find().sort([('time', 1)]).limit(3) >>> sort_limit_plan = deepcopy(cursor.explain()) >>> cursor = col.find().limit(3).sort([('time', 1)]) >>> limit_sort_plan = deepcopy(cursor.explain()) >>> sort_limit_plan == limit_sort_plan True 

Also see:

  • How do you tell Mongo to sort a collection before limiting the results?
like image 109
alecxe Avatar answered Oct 13 '22 00:10

alecxe