Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do you throttle an update script for MongoDB?

Tags:

mongodb

I have a pretty large MongoDB that I need to update, so I wrote a javascript thing to do it:

for (var i = 0; i < 1000000; i++) {
    db.test.update(
        {foo_field: dataArray1[i]},
        {$set: { bar_field: dataArray2[i]}},
        {upsert:false}
    )
}

I'm concerned that doing this script will do too many writes in a short time interval, and will degrade the performance of the database, so I want to rate limit the updates based on the replication delay.

However, I can't find a way to force the script to sleep or wait a given number of millis. It complains about setInterval and setTImeout, saying they're "not defined". Is this possible in Mongo?

like image 719
user1943735 Avatar asked Oct 22 '15 23:10

user1943735


2 Answers

If you are using MongoDB shell to do the updates, neither setInterval nor setTimeout would work. MongoDB shell does have sleep function, so you could add this line

// sleep 100ms after 1000 records inserted
if(1000 === counter) {
    sleep(100);
    counter = 0;
}
else {
    counter++;
}

Use this line in combination with record counter, e.g. sleep 100ms after every 1000 records have been inserted. Above code does not block the server.

Not to be confused with server command Sleep - MongoDB Manual 3.0

Anyway, I wouldn't be worried too much MongoDB not being able to handle inserts. You could use {w:1} as a write concern which ensures cursor is being returned. And finally, as far as I know, MongoDB shell is not asynchronous meaning it won't fire 100,000 calls at the same time but in sequence. NodeJS mongo driver however is asynchronous, so one could use lets say async.js to control the flow of data - for example bulk insert 1000 at a time, with 1-2 workers.

like image 164
jpaljasma Avatar answered Oct 08 '22 21:10

jpaljasma


BULK API, YOU MAY WANT TO USE?

You might want to look into the Bulk API. This will let you do everything in one update, with no need to throttle the loop.

var MongoClient = require('mongodb').MongoClient
MongoClient.connect('mongodb://127.0.0.1:27017/test', function (err, db) {
    if(err){
        console.log("DB ERR: " + err);
        process.exit();
    }else{
        console.log("CONNECTED!!");
    }
    var batch = db.collection("test").initializeUnorderedBulkOp({useLegacyOps: true});
    for (var i = 0; i < 5000; i++) {
            var query = {foo_field: "foo_" + i };
            batch.find(query).upsert().updateOne({$set: {foo: "bar_" + i}});
    }
    // updates batch of 5000 in one db call
   batch.execute(function(batchErr, result) {
        if(err){
            console.log("BATCH ERR:" , batchErr);
        }
        if(result){
            console.log("BATCH RESULT:", result);
        }
        db.close();
   });
});

RESULT

> db.test.find().sort({_id:-1}).limit(4)
{ "_id" : ObjectId("5629a60d028bb6308940dc67"), "foo_field" : "foo_4999", "foo" : "bar_4999" }
{ "_id" : ObjectId("5629a60d028bb6308940dc66"), "foo_field" : "foo_4998", "foo" : "bar_4998" }
{ "_id" : ObjectId("5629a60d028bb6308940dc65"), "foo_field" : "foo_4997", "foo" : "bar_4997" }
{ "_id" : ObjectId("5629a60d028bb6308940dc64"), "foo_field" : "foo_4996", "foo" : "bar_4996" }
like image 29
med116 Avatar answered Oct 08 '22 19:10

med116