Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to update a large number of documents in MongoDB most effeciently?

I want to update large numbers (> 100,000) of documents most efficiently.

My first naive approach was doing it on the JS level, writing scripts that fetch _ids first, then loop through _ids and invoke updates by _id (full docs or $set patches).

I ran into memory issues, also sharding the data into chunks of max. 500 documents (with opening and closing the connection) doesn't seem to work well.

So how can i solve this on the MongoDB level?
Best practice?

I have 3 common use cases, typically maintenance work flows:

1. Change type of value of property, without changing the value.

// before
{
  timestamp : '1446987395'
}

// after
{
  timestamp : 1446987395
}

2. Add new property based on value of existing property.

// before
{
  firstname : 'John',
  lastname  : 'Doe'
}

// after
{
  firstname : 'John',
  lastname  : 'Doe',
  name      : 'John Doe'
}

3. Simply adding removing properties from documents.

// before
{
  street    : 'Whatever Ave',
  street_no : '1025'
}

// after
{
  street    : 'Whatever Ave',
  no        : '1025'
}

Thanks for helping out.

like image 951
ezmilhouse Avatar asked Nov 08 '15 13:11

ezmilhouse


People also ask

How do I update MongoDB bulk records?

We can update multiple documents from the collection by using the bulk update method in MongoDB. 2) InitializeOrderedBulkOp – This method is used with the bulk update method to update multiple documents from the collection. We need to call this method at the time of using the bulk update method in MongoDB.

Can MongoDB handle millions of records?

Working with MongoDB and ElasticSearch is an accurate decision to process millions of records in real-time. These structures and concepts could be applied to larger datasets and will work extremely well too.

How many documents MongoDB can handle?

Mongo can easily handle billions of documents and can have billions of documents in the one collection but remember that the maximum document size is 16mb. There are many folk with billions of documents in MongoDB and there's lots of discussions about it on the MongoDB Google User Group.

Is count faster than find MongoDB?

find({}). count() more fast then collection.


1 Answers

If your MongoDB server is 2.6 or newer, it would be better to take advantage of using a write commands Bulk API that allow for the execution of bulk update operations which are simply abstractions on top of the server to make it easy to build bulk operations. These bulk operations come mainly in two flavours:

  • Ordered bulk operations. These operations execute all the operation in order and error out on the first write error.
  • Unordered bulk operations. These operations execute all the operations in parallel and aggregates up all the errors. Unordered bulk operations do not guarantee order of execution.

Note, for older servers than 2.6 the API will downconvert the operations. However it's not possible to downconvert 100% so there might be some edge cases where it cannot correctly report the right numbers.

For your three common use cases, you could implement the Bulk API like this:

Case 1. Change type of value of property, without changing the value:

var MongoClient = require('mongodb').MongoClient;

MongoClient.connect("mongodb://localhost:27017/test", function(err, db) {
    // Handle error
    if(err) throw err;

    // Get the collection and bulk api artefacts
    var col = db.collection('users'),           
        bulk = col.initializeOrderedBulkOp(), // Initialize the Ordered Batch
        counter = 0;        

    // Case 1. Change type of value of property, without changing the value.        
    col.find({"timestamp": {"$exists": true, "$type": 2} }).each(function (err, doc) {

        var newTimestamp = parseInt(doc.timestamp);
        bulk.find({ "_id": doc._id }).updateOne({
            "$set": { "timestamp": newTimestamp }
        });

        counter++;

        if (counter % 1000 == 0 ) {
            bulk.execute(function(err, result) {  
                // re-initialise batch operation           
                bulk = col.initializeOrderedBulkOp();
            });
        }
    });

    if (counter % 1000 != 0 ){
        bulk.execute(function(err, result) {
            // do something with result
            db.close();
        }); 
    } 
});

Case 2. Add new property based on value of existing property:

MongoClient.connect("mongodb://localhost:27017/test", function(err, db) {
    // Handle error
    if(err) throw err;

    // Get the collection and bulk api artefacts
    var col = db.collection('users'),           
        bulk = col.initializeOrderedBulkOp(), // Initialize the Ordered Batch
        counter = 0;        

    // Case 2. Add new property based on value of existing property.        
    col.find({"name": {"$exists": false } }).each(function (err, doc) {

        var fullName = doc.firstname + " " doc.lastname;
        bulk.find({ "_id": doc._id }).updateOne({
            "$set": { "name": fullName }
        });

        counter++;

        if (counter % 1000 == 0 ) {
            bulk.execute(function(err, result) {  
                // re-initialise batch operation           
                bulk = col.initializeOrderedBulkOp();
            });
        }
    });

    if (counter % 1000 != 0 ){
        bulk.execute(function(err, result) {
            // do something with result
            db.close();
        }); 
    } 
});

Case 3. Simply adding removing properties from documents.

MongoClient.connect("mongodb://localhost:27017/test", function(err, db) {
    // Handle error
    if(err) throw err;

    // Get the collection and bulk api artefacts
    var col = db.collection('users'),           
        bulk = col.initializeOrderedBulkOp(), // Initialize the Ordered Batch
        counter = 0;        

    // Case 3. Simply adding removing properties from documents.    
    col.find({"street_no": {"$exists": true } }).each(function (err, doc) {

        bulk.find({ "_id": doc._id }).updateOne({
            "$set": { "no": doc.street_no },
            "$unset": { "street_no": "" }
        });

        counter++;

        if (counter % 1000 == 0 ) {
            bulk.execute(function(err, result) {  
                // re-initialise batch operation           
                bulk = col.initializeOrderedBulkOp();
            });
        }
    });

    if (counter % 1000 != 0 ){
        bulk.execute(function(err, result) {
            // do something with result
            db.close();
        }); 
    } 
});
like image 165
chridam Avatar answered Oct 14 '22 16:10

chridam