Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I use a cursor.forEach() in MongoDB using Node.js?

I have a huge collection of documents in my DB and I'm wondering how can I run through all the documents and update them, each document with a different value.

like image 307
Alex Brodov Avatar asked Aug 26 '14 14:08

Alex Brodov


People also ask

How cursor can be used in MongoDB?

MongoDB also allows you to iterate cursor manually. So, to iterate a cursor manually simply assign the cursor return by the find() method to the var keyword Or JavaScript variable. Note: If a cursor inactive for 10 min then MongoDB server will automatically close that cursor.

What is a MongoDB cursor?

The Cursor is a MongoDB Collection of the document which is returned upon the find method execution. By default, it is automatically executed as a loop. However, we can explicitly get specific index document from being returned cursor. It is just like a pointer which is pointing upon a specific index value.

What is batchSize in MongoDB?

Definition. cursor.batchSize(size) Specifies the number of documents to return in each batch of the response from the MongoDB instance. In most cases, modifying the batch size will not affect the user or the application, as the mongo shell and most drivers return results as if MongoDB returned a single batch.

Do I need to close MongoDB cursor?

Closing the cursor is only really required when you do not "exhaust" the results. Or in other terms, iterate over all the possible results returned by the cursor. Leaving a "cursor" open is like leaving an open connection that never gets re-used.


2 Answers

The answer depends on the driver you're using. All MongoDB drivers I know have cursor.forEach() implemented one way or another.

Here are some examples:

node-mongodb-native

collection.find(query).forEach(function(doc) {   // handle }, function(err) {   // done or error }); 

mongojs

db.collection.find(query).forEach(function(err, doc) {   // handle }); 

monk

collection.find(query, { stream: true })   .each(function(doc){     // handle doc   })   .error(function(err){     // handle error   })   .success(function(){     // final callback   }); 

mongoose

collection.find(query).stream()   .on('data', function(doc){     // handle doc   })   .on('error', function(err){     // handle error   })   .on('end', function(){     // final callback   }); 

Updating documents inside of .forEach callback

The only problem with updating documents inside of .forEach callback is that you have no idea when all documents are updated.

To solve this problem you should use some asynchronous control flow solution. Here are some options:

  • async
  • promises (when.js, bluebird)

Here is an example of using async, using its queue feature:

var q = async.queue(function (doc, callback) {   // code for your update   collection.update({     _id: doc._id   }, {     $set: {hi: 'there'}   }, {     w: 1   }, callback); }, Infinity);  var cursor = collection.find(query); cursor.each(function(err, doc) {   if (err) throw err;   if (doc) q.push(doc); // dispatching doc to async.queue });  q.drain = function() {   if (cursor.isClosed()) {     console.log('all items have been processed');     db.close();   } } 
like image 140
Leonid Beschastny Avatar answered Oct 13 '22 06:10

Leonid Beschastny


Using the mongodb driver, and modern NodeJS with async/await, a good solution is to use next():

const collection = db.collection('things') const cursor = collection.find({   bla: 42 // find all things where bla is 42 }); let document; while ((document = await cursor.next())) {   await collection.findOneAndUpdate({     _id: document._id   }, {     $set: {       blu: 43     }   }); } 

This results in only one document at a time being required in memory, as opposed to e.g. the accepted answer, where many documents get sucked into memory, before processing of the documents starts. In cases of "huge collections" (as per the question) this may be important.

If documents are large, this can be improved further by using a projection, so that only those fields of documents that are required are fetched from the database.

like image 34
chris6953 Avatar answered Oct 13 '22 05:10

chris6953