Mongo aggregation cursor & counting

Tags:

According to the mongodb node driver docs the aggregate function now returns a cursor (from 2.6).

I hoped that I could use this to get a count of items pre limit & skipping but there doesn't seem to be any count function on the created cursor. If I run the same queries in the mongo shell the cursor has an itcount function that I can call to get what I want.

I saw that the created cursor has an on data event (does that mean it's a CursorStream?) which seemed to get triggered the expected number of times, but if I use it in combination with cursor.get no results get passed into the callback function.

Can the new cursor feature be used to count an aggregation query?

Edit for code:

In mongo shell:

> db.SentMessages.find({Type : 'Foo'})
{ "_id" : ObjectId("53ea19af9834184ad6d3675a"), "Name" : "123", "Type" : "Foo" }
{ "_id" : ObjectId("53ea19dd9834184ad6d3675c"), "Name" : "789", "Type" : "Foo" }
{ "_id" : ObjectId("53ea19d29834184ad6d3675b"), "Name" : "456", "Type" : "Foo" }

> db.SentMessages.find({Type : 'Foo'}).count()
3

> db.SentMessages.find({Type : 'Foo'}).limit(1)
{ "_id" : ObjectId("53ea19af9834184ad6d3675a"), "Name" : "123", "Type" : "Foo" }

> db.SentMessages.find({Type : 'Foo'}).limit(1).count();
3

> db.SentMessages.aggregate([ { $match : { Type : 'Foo'}} ])
{ "_id" : ObjectId("53ea19af9834184ad6d3675a"), "Name" : "123", "Type" : "Foo" }
{ "_id" : ObjectId("53ea19dd9834184ad6d3675c"), "Name" : "789", "Type" : "Foo" }
{ "_id" : ObjectId("53ea19d29834184ad6d3675b"), "Name" : "456", "Type" : "Foo" }

> db.SentMessages.aggregate([ { $match : { Type : 'Foo'}} ]).count()
2014-08-12T14:47:12.488+0100 TypeError: Object #<Object> has no method 'count'

> db.SentMessages.aggregate([ { $match : { Type : 'Foo'}} ]).itcount()
3

> db.SentMessages.aggregate([ { $match : { Type : 'Foo'}}, {$limit : 1} ])
{ "_id" : ObjectId("53ea19af9834184ad6d3675a"), "Name" : "123", "Type" : "Foo" }

> db.SentMessages.aggregate([ { $match : { Type : 'Foo'}}, {$limit : 1} ]).itcount()
1

> exit
bye

In Node:

var cursor = collection.aggregate([ { $match : { Type : 'Foo'}}, {$limit : 1} ], { cursor : {}});

cursor.get(function(err, res){
  // res is as expected (1 doc)
});

cursor.count() does not exist

cursor.itcount() does not exist

The on data event exists:

cursor.on('data', function(){
    totalItems++;
});

but when used in combination with cursor.get, the .get callback function now contains 0 docs

Edit 2: The cursor returned appears to be an aggregation cursor rather than one of the cursors listed in the docs

479

asked Aug 11 '14 08:08

Dan

1 Answers

This possibly deserves a full explanation for those who might search for this, so adding one for posterity.

Specifically what is returned is an Event Stream for node.js which effectively wraps the stream.Readable interface with a couple of convenience methods. A .count() is not one of them at present and considering the current interface used would not make much sense.

Similar to the result returned from the .stream() method available to cursor objects, a "count" would not make much sense here when you consider the implementation, as it is meant to process as a "stream" where eventually you are going to reach an "end" but otherwise just want to process until getting there.

If you considered the standard "Cursor" interface from the driver, there are some solid reasons why the aggregation cursor is not the same:

Cursors allow "modifier" actions to be processed prior to execution. These fall into the categories of .sort(), .limit() and .skip(). All of these actually have counterpart directives in the aggregation framework that are specified in the pipeline. As pipeline stages that could appear "anywhere" and not just as a post-processing option to a simple query, this would not make much sense to offer the same "cursor" processing.
Other cursor modifiers include specials like .hint(), .min() and .max() which are alterations to "index selection" and processing. Whilst these could be of use to the aggregation pipeline, there is currently no simple way to include these in query selection. Mostly the logic from the previous point overrides any point of using the same type of interface for a "Cursor".

The other considerations are what you actually want to do with a cursor and why you "want" one returned. Since a cursor is usually a "one way trip" in the sense that they are usually only processed until an end is reached and in usable "batches", then it makes a reasonable conclusion that the "count" just actually comes at the end, when in fact that "queue" is finally depleted.

While it is true that in fact the standard "cursor" implementation holds some tricks, the main reason is that this just extends a "meta" data concept as the query profiling engine must "scan" a certain number of document in order to determine which items to return in the result.

The aggregation framework plays with this concept a little though. Since not only are there the same results as would be processed through the standard query profiler, but also there are additional stages. Any of these stages has the potential to "modify" the resulting "count" that would actually be returned in the "stream" to be processed.

Again, if you want to look at this from an academic point of view and say that "Sure, the query engine should keep the 'meta data' for the count, but can we not track what is modified after?". This would be a fair argument, and pipeline operators such as $match and $group or $unwind and possibly even including $project and the new $redact, all could be considered a reasonable case for keeping their own track of the "documents processed" in each pipeline stage and update that in the "meta data" that could possibly be returned to explain the full pipeline result count.

The last argument is reasonable, but consider also that at the present time the implementation of a "Cursor" concept for the aggregation pipeline results is a new concept for MongoDB. It could be fairly argued that all "reasonable" expectations at the first design point would have been that "most" results from combining documents would not be of a size that was restrictive to the BSON limitations. But as usage expands then perceptions are altered and things change to adapt.

So this "could" possibly be changed, but it is not how it is "currently" implemented. While .count() on a standard cursor implementation has access to the "meta data" where the scanned number is recorded, any method on the current implementation would result in retrieving all of the cursor results, just as .itcount() does in the shell.

Process the "cursor" items by counting on the "data" event and emitting something ( possibly a JSON stream generator ) as the "count" at the end. For any use case that would require a count "up-front" it would not seem like a valid use for a cursor anyway, as surely the output would be a whole document of a reasonable size.

161

answered Sep 22 '22 15:09

Neil Lunn

Related questions
                            
                                Loading gif image is not showing in IE and chrome
                            
                                Slow response to click event on iPad
                            
                                How to detect end of scrolling
                            
                                JavaScript inheritance with _.extend()
                            
                                ReferenceError: function is not defined - JavaScript
                            
                                What is the difference between these three module pattern implementations in JavaScript?
                            
                                Why I always get "Uncaught SyntaxError: Unexpected token u " from Chrome?
                            
                                Which date formats are IETF-compliant RFC 2822 timestamps?
                            
                                Google PlusOne Button has errors on Chrome
                            
                                How replicate the value of Y Axis on both sides of the axis in Highcharts
                            
                                Javascript Adding Time for a Future Date
                            
                                does setTimeout() affects performance
                            
                                What is the difference between offsetHeight and scrollHeight of an element in DOM?
                            
                                Strange behavior in Javascript new Date function
                            
                                Status of cancellable promises
                            
                                Best practice to Serialize java.time.LocalDateTime (java 8) to js Date using GSON
                            
                                How to display an image from a file input? [duplicate]
                            
                                jquery.fullPage.js how to enable scroll within a div without affecting the sections scroll
                            
                                jQuery Datepicker - Get date on hover
                            
                                Accessing signing/encryption in a browser's Keystore using JavaScript - sample code? (WebCryptoAPI)

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Mongo aggregation cursor & counting

Tags:

javascript

node.js

mongodb

mongodb-query

Dan

People also ask

1 Answers

Neil Lunn

Recent Activity

Donate For Us