Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is a Cursor in MongoDB?

We are troubled by eventually occurring cursor not found exceptions for some Morphia Queries asList and I've found a hint on SO, that this might be quite memory consumptive.

Now I'd like to know a bit more about the background: can sombody explain (in English), what a Cursor (in MongoDB) actually is? Why can it kept open or be not found?


The documentation defines a cursor as:

A pointer to the result set of a query. Clients can iterate through a cursor to retrieve results. By default, cursors timeout after 10 minutes of inactivity

But this is not very telling. Maybe it could be helpful to define a batch for query results, because the documentation also states:

The MongoDB server returns the query results in batches. Batch size will not exceed the maximum BSON document size. For most queries, the first batch returns 101 documents or just enough documents to exceed 1 megabyte. Subsequent batch size is 4 megabytes. [...] For queries that include a sort operation without an index, the server must load all the documents in memory to perform the sort before returning any results.

Note: in our queries in question we don't use sort statements at all, but also no limit and offset.

like image 510
BairDev Avatar asked Apr 21 '16 10:04

BairDev


People also ask

What is cursor in Python MongoDB?

find() to search documents in collections then as a result it returns a pointer. That pointer is known as a cursor. Consider if we have 2 documents in our collection, then the cursor object will point to the first document and then iterate through all documents which are present in our collection.

Do I need to close MongoDB cursor?

Closing the cursor is only really required when you do not "exhaust" the results. Or in other terms, iterate over all the possible results returned by the cursor. Leaving a "cursor" open is like leaving an open connection that never gets re-used.

How do I know if my MongoDB cursor is empty?

Approach 1: The cursor returned is an iterable, thus we can convert it into a list. If the length of the list is zero (i.e. List is empty), this implies the cursor is empty as well.

What is MongoCursor?

Interface MongoCursor<TResult>Returns the server cursor, which can be null if the no cursor was created or if the cursor has been exhausted or killed.


1 Answers

Here's a comparison between toArray() and cursors after a find() in the Node.js MongoDB driver. Common code:

var MongoClient = require('mongodb').MongoClient, assert = require('assert');  MongoClient.connect('mongodb://localhost:27017/crunchbase', function (err, db) {     assert.equal(err, null);     console.log('Successfully connected to MongoDB.');      const query = { category_code: "biotech" };      // toArray() vs. cursor code goes here }); 

Here's the toArray() code that goes in the section above.

    db.collection('companies').find(query).toArray(function (err, docs) {         assert.equal(err, null);         assert.notEqual(docs.length, 0);          docs.forEach(doc => {             console.log(`${doc.name} is a ${doc.category_code} company.`);         });          db.close();     }); 

Per the documentation,

The caller is responsible for making sure that there is enough memory to store the results.

Here's the cursor-based approach, using the cursor.forEach() method:

    const cursor = db.collection('companies').find(query);      cursor.forEach(         function (doc) {             console.log(`${doc.name} is a ${doc.category_code} company.`);         },         function (err) {             assert.equal(err, null);             return db.close();         }     ); }); 

With the forEach() approach, instead of fetching all data in memory, we're streaming the data to our application. find() creates a cursor immediately because it doesn't actually make a request to the database until we try to use some of the documents it will provide. The point of cursor is to describe our query. The second parameter to cursor.forEach shows what to do when an error occurs.

In the initial version of the above code, it was toArray() which forced the database call. It meant we needed ALL the documents and wanted them to be in an array.

Note that MongoDB returns data in batches. The image below shows requests from cursors (from application) to MongoDB:

MongoDB cursor graphic

forEach scales better than toArray because we can process documents as they come in until we reach the end. Contrast it with toArray - where we wait for ALL the documents to be retrieved and the entire array is built. This means we're not getting any advantage from the fact that the driver and the database system are working together to batch results to your application. Batching is meant to provide efficiency in terms of memory overhead and the execution time. Take advantage of it in your application, if you can.

like image 199
xameeramir Avatar answered Oct 01 '22 06:10

xameeramir