According to the mongodb node driver docs the aggregate function now returns a cursor (from 2.6).
I hoped that I could use this to get a count of items pre limit & skipping but there doesn't seem to be any count function on the created cursor. If I run the same queries in the mongo shell the cursor has an itcount function that I can call to get what I want.
I saw that the created cursor has an on data event (does that mean it's a CursorStream?) which seemed to get triggered the expected number of times, but if I use it in combination with cursor.get no results get passed into the callback function.
Can the new cursor feature be used to count an aggregation query?
Edit for code:
In mongo shell:
> db.SentMessages.find({Type : 'Foo'})
{ "_id" : ObjectId("53ea19af9834184ad6d3675a"), "Name" : "123", "Type" : "Foo" }
{ "_id" : ObjectId("53ea19dd9834184ad6d3675c"), "Name" : "789", "Type" : "Foo" }
{ "_id" : ObjectId("53ea19d29834184ad6d3675b"), "Name" : "456", "Type" : "Foo" }
> db.SentMessages.find({Type : 'Foo'}).count()
3
> db.SentMessages.find({Type : 'Foo'}).limit(1)
{ "_id" : ObjectId("53ea19af9834184ad6d3675a"), "Name" : "123", "Type" : "Foo" }
> db.SentMessages.find({Type : 'Foo'}).limit(1).count();
3
> db.SentMessages.aggregate([ { $match : { Type : 'Foo'}} ])
{ "_id" : ObjectId("53ea19af9834184ad6d3675a"), "Name" : "123", "Type" : "Foo" }
{ "_id" : ObjectId("53ea19dd9834184ad6d3675c"), "Name" : "789", "Type" : "Foo" }
{ "_id" : ObjectId("53ea19d29834184ad6d3675b"), "Name" : "456", "Type" : "Foo" }
> db.SentMessages.aggregate([ { $match : { Type : 'Foo'}} ]).count()
2014-08-12T14:47:12.488+0100 TypeError: Object #<Object> has no method 'count'
> db.SentMessages.aggregate([ { $match : { Type : 'Foo'}} ]).itcount()
3
> db.SentMessages.aggregate([ { $match : { Type : 'Foo'}}, {$limit : 1} ])
{ "_id" : ObjectId("53ea19af9834184ad6d3675a"), "Name" : "123", "Type" : "Foo" }
> db.SentMessages.aggregate([ { $match : { Type : 'Foo'}}, {$limit : 1} ]).itcount()
1
> exit
bye
In Node:
var cursor = collection.aggregate([ { $match : { Type : 'Foo'}}, {$limit : 1} ], { cursor : {}});
cursor.get(function(err, res){
// res is as expected (1 doc)
});
cursor.count() does not exist
cursor.itcount() does not exist
The on data event exists:
cursor.on('data', function(){
totalItems++;
});
but when used in combination with cursor.get, the .get callback function now contains 0 docs
Edit 2: The cursor returned appears to be an aggregation cursor rather than one of the cursors listed in the docs
In MongoDB, the find() method return the cursor, now to access the document we need to iterate the cursor. In the mongo shell, if the cursor is not assigned to a var keyword then the mongo shell automatically iterates the cursor up to 20 documents. MongoDB also allows you to iterate cursor manually.
On large collections of millions of documents, MongoDB's aggregation was shown to be much worse than Elasticsearch. Performance worsens with collection size when MongoDB starts using the disk due to limited system RAM. The $lookup stage used without indexes can be very slow.
mongodb - Mongo performance is extremely slow for an aggregation query - Stack Overflow. Stack Overflow for Teams – Start collaborating and sharing organizational knowledge.
You can use $and with aggregation but you don't have to write it, and is implicit using different filters, in fact you can pipe those filters in case one of them needs a different solution.
This possibly deserves a full explanation for those who might search for this, so adding one for posterity.
Specifically what is returned is an Event Stream for node.js which effectively wraps the stream.Readable interface with a couple of convenience methods. A .count()
is not one of them at present and considering the current interface used would not make much sense.
Similar to the result returned from the .stream()
method available to cursor objects, a "count" would not make much sense here when you consider the implementation, as it is meant to process as a "stream" where eventually you are going to reach an "end" but otherwise just want to process until getting there.
If you considered the standard "Cursor" interface from the driver, there are some solid reasons why the aggregation cursor is not the same:
Cursors allow "modifier" actions to be processed prior to execution. These fall into the categories of .sort()
, .limit()
and .skip()
. All of these actually have counterpart directives in the aggregation framework that are specified in the pipeline. As pipeline stages that could appear "anywhere" and not just as a post-processing option to a simple query, this would not make much sense to offer the same "cursor" processing.
Other cursor modifiers include specials like .hint()
, .min()
and .max()
which are alterations to "index selection" and processing. Whilst these could be of use to the aggregation pipeline, there is currently no simple way to include these in query selection. Mostly the logic from the previous point overrides any point of using the same type of interface for a "Cursor".
The other considerations are what you actually want to do with a cursor and why you "want" one returned. Since a cursor is usually a "one way trip" in the sense that they are usually only processed until an end is reached and in usable "batches", then it makes a reasonable conclusion that the "count" just actually comes at the end, when in fact that "queue" is finally depleted.
While it is true that in fact the standard "cursor" implementation holds some tricks, the main reason is that this just extends a "meta" data concept as the query profiling engine must "scan" a certain number of document in order to determine which items to return in the result.
The aggregation framework plays with this concept a little though. Since not only are there the same results as would be processed through the standard query profiler, but also there are additional stages. Any of these stages has the potential to "modify" the resulting "count" that would actually be returned in the "stream" to be processed.
Again, if you want to look at this from an academic point of view and say that "Sure, the query engine should keep the 'meta data' for the count, but can we not track what is modified after?". This would be a fair argument, and pipeline operators such as $match
and $group
or $unwind
and possibly even including $project
and the new $redact
, all could be considered a reasonable case for keeping their own track of the "documents processed" in each pipeline stage and update that in the "meta data" that could possibly be returned to explain the full pipeline result count.
The last argument is reasonable, but consider also that at the present time the implementation of a "Cursor" concept for the aggregation pipeline results is a new concept for MongoDB. It could be fairly argued that all "reasonable" expectations at the first design point would have been that "most" results from combining documents would not be of a size that was restrictive to the BSON limitations. But as usage expands then perceptions are altered and things change to adapt.
So this "could" possibly be changed, but it is not how it is "currently" implemented. While .count()
on a standard cursor implementation has access to the "meta data" where the scanned number is recorded, any method on the current implementation would result in retrieving all of the cursor results, just as .itcount()
does in the shell.
Process the "cursor" items by counting on the "data" event and emitting something ( possibly a JSON stream generator ) as the "count" at the end. For any use case that would require a count "up-front" it would not seem like a valid use for a cursor anyway, as surely the output would be a whole document of a reasonable size.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With