Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the right approach to update many records in MongoDB using Mongoose

Tags:

I am pulling some records from MongoDB using Mongoose, importing them into another system and then I would like to set status (document attribute) for all these documents to processed.

I could find this solution: Update multiple documents by id set. Mongoose

I was wondering if that is the right approach, to build up a criterion consisting of all document ids and then perform the update. Please also take into account a fact that it's going to be many documents.

(What is the limit of the update query? Couldn't find it anywhere. Official documentation: http://mongoosejs.com/docs/2.7.x/docs/updating-documents.html)

like image 425
Ondrej Tokar Avatar asked Aug 03 '16 11:08

Ondrej Tokar


People also ask

What is update many in MongoDB?

Introduction to MongoDB updateMany() method The updateMany() method allows you to update all documents that satisfy a condition. The following shows the syntax of the updateMany() method: db.collection.updateMany(filter, update, options)

Which command is used to update the existing data in MongoDB?

MongoDB's update() and save() methods are used to update document into a collection. The update() method updates the values in the existing document while the save() method replaces the existing document with the document passed in save() method.

How does Mongoose update work?

Updating Using save() Mongoose documents track changes. You can modify a document using vanilla JavaScript assignments and Mongoose will convert it into MongoDB update operators. doc.name = 'foo'; // Mongoose sends an `updateOne({ _id: doc. _id }, { $set: { name: 'foo' } })` // to MongoDB.


2 Answers

The approach of building up a criterion consisting of all document ids and then performing the update is bound to cause potential issues. When you iterate a list of documents sending an update operation with each doc, in Mongoose you run the risk of blowing up your server especially when dealing with a large dataset because you are not waiting for an asynchronous call to complete before moving on to the next iteration. You will be essentially building a "stack" of unresolved operations until this causes a problem - Stackoverflow.

Take for example, supposing you had an array of document ids that you wanted to update the matching document on the status field:

const processedIds = [   "57a0a96bd1c6ef24376477cd",   "57a052242acf5a06d4996537",   "57a052242acf5a06d4996538" ]; 

where you can use the updateMany() method

Model.updateMany(   { _id: { $in: processedIds } },    { $set: { status: "processed" } },    callback ); 

or alternatively for really small datasets you could use the forEach() method on the array to iterate it and update your collection:

processedIds.forEach(function(id)){   Model.update({ _id: id}, { $set: { status: "processed" } }, callback); }); 

The above is okay for small datasets. However, this becomes an issue when you are faced with thousands or millions of documents to update as you will be making repeated server calls of asynchronous code within the loop.

To overcome this use something like async's eachLimit and iterate over the array performing a MongoDB update operation for each item while never performing more than x parallel updates the same time.


The best approach would be to use the bulk API for this which is extremely efficient in processing updates in bulk. The difference in performance vs calling the update operation on each and every one of the many documents is that instead of sending the update requests to the server with each iteration, the bulk API sends the requests once in every 1000 requests (batched).

For Mongoose versions >=4.3.0 which support MongoDB Server 3.2.x, you can use bulkWrite() for updates. The following example shows how you can go about this:

const bulkUpdateCallback = function(err, r){   console.log(r.matchedCount);   console.log(r.modifiedCount); }  // Initialize the bulk operations array const bulkUpdateOps = [], counter = 0;  processedIds.forEach(function (id) {   bulkUpdateOps.push({     updateOne: {       filter: { _id: id },       update: { $set: { status: "processed" } }     }   });   counter++;    if (counter % 500 == 0) {     // Get the underlying collection via the Node.js driver collection object     Model.collection.bulkWrite(bulkUpdateOps, { ordered: true, w: 1 }, bulkUpdateCallback);     bulkUpdateOps = []; // re-initialize   } })  // Flush any remaining bulk ops if (counter % 500 != 0) {   Model.collection.bulkWrite(bulkOps, { ordered: true, w: 1 }, bulkUpdateCallback); } 

For Mongoose versions ~3.8.8, ~3.8.22, 4.x which support MongoDB Server >=2.6.x, you could use the Bulk API as follows

var bulk = Model.collection.initializeOrderedBulkOp(),     counter = 0;  processedIds.forEach(function(id) {     bulk.find({ "_id": id }).updateOne({          "$set": { "status": "processed" }     });      counter++;     if (counter % 500 == 0) {         bulk.execute(function(err, r) {            // do something with the result            bulk = Model.collection.initializeOrderedBulkOp();            counter = 0;         });     } });  // Catch any docs in the queue under or over the 500's if (counter > 0) {     bulk.execute(function(err,result) {        // do something with the result here     }); } 
like image 114
chridam Avatar answered Oct 16 '22 19:10

chridam


You can use {multi: true} option in your update query for bulk update.

Example:

employees.update({ _id: { $gt: 3 } },{$inc: { sortOrder: -1 }},{'multi':true}); 

The above code in mongoose is equivalent to the below code in mongodb:

db.employees.updateMany({ _id: { $gt: 3 } },{$inc: { sortOrder: -1 }}); 
like image 23
Avinash Avatar answered Oct 16 '22 17:10

Avinash