Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Update with upsert, but only update if date field of document in db is less than updated document

I am having a bit of an issue trying to come up with the logic for this. So, what I want to do is:

  • Bulk update a bunch of posts to my remote MongoDB instance BUT
  • If update, only update if lastModified field on the remote collection is less than lastModified field in the same document that I am about to update/insert

Basically, I want to update my list of documents if they have been modified since the last time I updated them. I can think of two brute force ways to do it...

First, querying my entire collection, trying to manually remove and replace the documents that match the criteria, add the new ones, and then mass insert everything back to the remote collection after deleting everything in remote.

Second, query each item and then deciding, if there is one in remote, if I want to update it or no. This seems like it would be very tasking when dealing with remote collections.

If relevant, I am working on a NodeJS environment, using the mondodb npm package for database operations.

like image 623
Sammy I. Avatar asked Dec 30 '16 00:12

Sammy I.


1 Answers

You can use the bulkWrite API to carry out the updates based on the logic you specified as it handles this better.

For example, the following snippet shows how to go about this assuming you already have the data from the web service you need to update the remote collection with:

mongodb.connect(mongo_url, function(err, db) {
    if(err) console.log(err);
    else {
        var mongo_remote_collection = db.collection("remote_collection_name");

        /* data is from http call to an external service or ideally
           place this within the service callback
        */
        mongoUpsert(mongo_remote_collection, data, function() {
            db.close();
        })
    }
})

function mongoUpsert(collection, data_array, cb) {      
    var ops = data_array.map(function(data) {
        return {
            "updateOne": {
                "filter": { 
                    "_id": data._id, // or any other filtering mechanism to identify a doc
                    "lastModified": { "$lt": data.lastModified }
                },
                "update": { "$set": data },
                "upsert": true
            }
        };
    });

    collection.bulkWrite(ops, function(err, r) {
        // do something with result
    });

    return cb(false);
}

If the data from the external service is huge then consider sending the writes to the server in batches of 500 which gives you a better performance as you are not sending every request to the server, just once in every 500 requests.

For bulk operations MongoDB imposes a default internal limit of 1000 operations per batch and so the choice of 500 documents is good in the sense that you have some control over the batch size rather than let MongoDB impose the default, i.e. for larger operations in the magnitude of > 1000 documents. So for the above case in the first approach one could just write all the array at once as this is small but the 500 choice is for larger arrays.

var ops = [],
    counter = 0;

data_array.forEach(function(data) {
    ops.push({
        "updateOne": {
            "filter": { 
                "_id": data._id,
                "lastModified": { "$lt": data.lastModified }
            },
            "update": { "$set": data },
            "upsert": true
        }
    });
    counter++;

    if (counter % 500 === 0) {
        collection.bulkWrite(ops, function(err, r) {
            // do something with result
        });
        ops = [];
    }
})

if (counter % 500 != 0) {
    collection.bulkWrite(ops, function(err, r) {
        // do something with result
    }
}
like image 87
chridam Avatar answered Nov 15 '22 23:11

chridam