Despite reading peoples answers stating that the sort is done first, evidence shows something different that the limit is done before the sort. Is there a way to force sort always first?
views = mongo.db.view_logging.find().sort([('count', 1)]).limit(10)
Whether I use .sort().limit()
or .limit().sort()
, the limit takes precedence. I wonder if this is something to do with pymongo
...
Sorting with the limit() methodThe sort() method can be used along with the limit() method that limits the number of results in the search query. You should pass an integer to the limit() method, which then specifies the number of documents to which the result set should be limited.
Use the sort() method to sort the result in ascending or descending order. The sort() method takes one parameter for "fieldname" and one parameter for "direction" (ascending is the default direction).
To sort the results of a query in ascending or, descending order pymongo provides the sort() method. To this method, pass a number value representing the number of documents you need in the result.
To sort documents in MongoDB, you need to use sort() method. The method accepts a document containing a list of fields along with their sorting order. To specify sorting order 1 and -1 are used. 1 is used for ascending order while -1 is used for descending order.
According to the documentation, regardless of which goes first in your chain of commands, sort()
would be always applied before the limit()
.
You can also study the .explain()
results of your query and look at the execution stages - you will find that the sorting input stage examines all of the filtered (in your case all documents in the collection) and then the limit is applied.
Let's go through an example.
Imagine there is a foo
database with a test
collection having 6 documents:
>>> col = db.foo.test >>> for doc in col.find(): ... print(doc) {'time': '2016-03-28 12:12:00', '_id': ObjectId('56f9716ce4b05e6b92be87f2'), 'value': 90} {'time': '2016-03-28 12:13:00', '_id': ObjectId('56f971a3e4b05e6b92be87fc'), 'value': 82} {'time': '2016-03-28 12:14:00', '_id': ObjectId('56f971afe4b05e6b92be87fd'), 'value': 75} {'time': '2016-03-28 12:15:00', '_id': ObjectId('56f971b7e4b05e6b92be87ff'), 'value': 72} {'time': '2016-03-28 12:16:00', '_id': ObjectId('56f971c0e4b05e6b92be8803'), 'value': 81} {'time': '2016-03-28 12:17:00', '_id': ObjectId('56f971c8e4b05e6b92be8806'), 'value': 90}
Now, let's execute queries with different order of sort()
and limit()
and check the results and the explain plan.
Sort and then limit:
>>> from pprint import pprint >>> cursor = col.find().sort([('time', 1)]).limit(3) >>> sort_limit_plan = cursor.explain() >>> pprint(sort_limit_plan) {u'executionStats': {u'allPlansExecution': [], u'executionStages': {u'advanced': 3, u'executionTimeMillisEstimate': 0, u'inputStage': {u'advanced': 6, u'direction': u'forward', u'docsExamined': 6, u'executionTimeMillisEstimate': 0, u'filter': {u'$and': []}, u'invalidates': 0, u'isEOF': 1, u'nReturned': 6, u'needFetch': 0, u'needTime': 1, u'restoreState': 0, u'saveState': 0, u'stage': u'COLLSCAN', u'works': 8}, u'invalidates': 0, u'isEOF': 1, u'limitAmount': 3, u'memLimit': 33554432, u'memUsage': 213, u'nReturned': 3, u'needFetch': 0, u'needTime': 8, u'restoreState': 0, u'saveState': 0, u'sortPattern': {u'time': 1}, u'stage': u'SORT', u'works': 13}, u'executionSuccess': True, u'executionTimeMillis': 0, u'nReturned': 3, u'totalDocsExamined': 6, u'totalKeysExamined': 0}, u'queryPlanner': {u'indexFilterSet': False, u'namespace': u'foo.test', u'parsedQuery': {u'$and': []}, u'plannerVersion': 1, u'rejectedPlans': [], u'winningPlan': {u'inputStage': {u'direction': u'forward', u'filter': {u'$and': []}, u'stage': u'COLLSCAN'}, u'limitAmount': 3, u'sortPattern': {u'time': 1}, u'stage': u'SORT'}}, u'serverInfo': {u'gitVersion': u'6ce7cbe8c6b899552dadd907604559806aa2e9bd', u'host': u'h008742.mongolab.com', u'port': 53439, u'version': u'3.0.7'}}
Limit and then sort:
>>> cursor = col.find().limit(3).sort([('time', 1)]) >>> limit_sort_plan = cursor.explain() >>> pprint(limit_sort_plan) {u'executionStats': {u'allPlansExecution': [], u'executionStages': {u'advanced': 3, u'executionTimeMillisEstimate': 0, u'inputStage': {u'advanced': 6, u'direction': u'forward', u'docsExamined': 6, u'executionTimeMillisEstimate': 0, u'filter': {u'$and': []}, u'invalidates': 0, u'isEOF': 1, u'nReturned': 6, u'needFetch': 0, u'needTime': 1, u'restoreState': 0, u'saveState': 0, u'stage': u'COLLSCAN', u'works': 8}, u'invalidates': 0, u'isEOF': 1, u'limitAmount': 3, u'memLimit': 33554432, u'memUsage': 213, u'nReturned': 3, u'needFetch': 0, u'needTime': 8, u'restoreState': 0, u'saveState': 0, u'sortPattern': {u'time': 1}, u'stage': u'SORT', u'works': 13}, u'executionSuccess': True, u'executionTimeMillis': 0, u'nReturned': 3, u'totalDocsExamined': 6, u'totalKeysExamined': 0}, u'queryPlanner': {u'indexFilterSet': False, u'namespace': u'foo.test', u'parsedQuery': {u'$and': []}, u'plannerVersion': 1, u'rejectedPlans': [], u'winningPlan': {u'inputStage': {u'direction': u'forward', u'filter': {u'$and': []}, u'stage': u'COLLSCAN'}, u'limitAmount': 3, u'sortPattern': {u'time': 1}, u'stage': u'SORT'}}, u'serverInfo': {u'gitVersion': u'6ce7cbe8c6b899552dadd907604559806aa2e9bd', u'host': u'h008742.mongolab.com', u'port': 53439, u'version': u'3.0.7'}}
As you can see, in both cases the sort is applied first and affects all the 6 documents and then the limit limits the results to 3.
And, the execution plans are exactly the same:
>>> from copy import deepcopy # just in case >>> cursor = col.find().sort([('time', 1)]).limit(3) >>> sort_limit_plan = deepcopy(cursor.explain()) >>> cursor = col.find().limit(3).sort([('time', 1)]) >>> limit_sort_plan = deepcopy(cursor.explain()) >>> sort_limit_plan == limit_sort_plan True
Also see:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With