I have a huge collection of documents in my DB and I'm wondering how can I run through all the documents and update them, each document with a different value.
MongoDB also allows you to iterate cursor manually. So, to iterate a cursor manually simply assign the cursor return by the find() method to the var keyword Or JavaScript variable. Note: If a cursor inactive for 10 min then MongoDB server will automatically close that cursor.
The Cursor is a MongoDB Collection of the document which is returned upon the find method execution. By default, it is automatically executed as a loop. However, we can explicitly get specific index document from being returned cursor. It is just like a pointer which is pointing upon a specific index value.
Definition. cursor.batchSize(size) Specifies the number of documents to return in each batch of the response from the MongoDB instance. In most cases, modifying the batch size will not affect the user or the application, as the mongo shell and most drivers return results as if MongoDB returned a single batch.
Closing the cursor is only really required when you do not "exhaust" the results. Or in other terms, iterate over all the possible results returned by the cursor. Leaving a "cursor" open is like leaving an open connection that never gets re-used.
The answer depends on the driver you're using. All MongoDB drivers I know have cursor.forEach()
implemented one way or another.
Here are some examples:
collection.find(query).forEach(function(doc) { // handle }, function(err) { // done or error });
db.collection.find(query).forEach(function(err, doc) { // handle });
collection.find(query, { stream: true }) .each(function(doc){ // handle doc }) .error(function(err){ // handle error }) .success(function(){ // final callback });
collection.find(query).stream() .on('data', function(doc){ // handle doc }) .on('error', function(err){ // handle error }) .on('end', function(){ // final callback });
.forEach
callbackThe only problem with updating documents inside of .forEach
callback is that you have no idea when all documents are updated.
To solve this problem you should use some asynchronous control flow solution. Here are some options:
Here is an example of using async
, using its queue
feature:
var q = async.queue(function (doc, callback) { // code for your update collection.update({ _id: doc._id }, { $set: {hi: 'there'} }, { w: 1 }, callback); }, Infinity); var cursor = collection.find(query); cursor.each(function(err, doc) { if (err) throw err; if (doc) q.push(doc); // dispatching doc to async.queue }); q.drain = function() { if (cursor.isClosed()) { console.log('all items have been processed'); db.close(); } }
Using the mongodb
driver, and modern NodeJS with async/await, a good solution is to use next()
:
const collection = db.collection('things') const cursor = collection.find({ bla: 42 // find all things where bla is 42 }); let document; while ((document = await cursor.next())) { await collection.findOneAndUpdate({ _id: document._id }, { $set: { blu: 43 } }); }
This results in only one document at a time being required in memory, as opposed to e.g. the accepted answer, where many documents get sucked into memory, before processing of the documents starts. In cases of "huge collections" (as per the question) this may be important.
If documents are large, this can be improved further by using a projection, so that only those fields of documents that are required are fetched from the database.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With