Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Bulk updating data in DocumentDB

I have a desire to add a property with a default value to a set of documents that I retrieve via a SELECT query if they contain no value.

I was thinking of this in two parts:

  1. SELECT * FROM c article WHERE article.details.locale = 'en-us'

I'd like to find all articles where article.details.x does not exist.

  1. Add the property, article.details.x = true

I was hoping this EXEC command could be supported via the Azure Portal so I don't have to create a migration tool to run this command once but I couldn't find this option in the portal. Is this possible?

like image 726
Valchris Avatar asked Jan 12 '16 19:01

Valchris


People also ask

How do I update multiple files in MongoDB?

You can update multiple documents using the collection. updateMany() method. The updateMany() method accepts a filter document and an update document. If the query matches documents in the collection, the method applies the updates from the update document to fields and values of the matching documents.

Can we update data in Cosmos DB?

Change feed in Azure Cosmos DB listens to a container for any changes and then outputs documents that were changed. Using change feed, you see all updates to documents including both partial and full document updates.

Is AWS DocumentDB same as MongoDB?

Amazon DocumentDB is a NoSQL JSON document database service with a limited degree of compatibility with MongoDB. DocumentDB is not based on the MongoDB server. Rather it emulates the MongoDB API, and runs on top of Amazon's Aurora backend platform.

What is bulk in MongoDB?

Bulk() Bulk operations builder used to construct a list of write operations to perform in bulk for a single collection. To instantiate the builder, use either the db. collection. initializeOrderedBulkOp() or the db.


2 Answers

You can use Azure Document DB Studio as a front end to creating and executing a stored procedure. It can be found here. It's pretty easy to setup and use.

I've mocked up a stored procedure based on your example:

function updateArticlesDetailsX() {

   var collection = getContext().getCollection();
   var collectionLink = collection.getSelfLink();
   var response = getContext().getResponse();
   var docCount = 0;
   var counter = 0;

   tryQueryAndUpdate();

   function tryQueryAndUpdate(continuation) {
        var query = {
            query: "select * from root r where IS_DEFINED(r.details.x) != true"
        };

        var requestOptions = {
            continuation: continuation
        };

        var isAccepted =
            collection
            .queryDocuments(collectionLink,
                            query,
                            requestOptions,
                            function queryCallback(err, documents, responseOptions) {
                                     if (err) throw err;
                                     if (documents.length > 0) {
                                        // If at least one document is found, update it.
                                        docCount = documents.length;
                                        for (var i=0; i<docCount; i++){
                                            tryUpdate(documents[i]);
                                        }
                                        response.setBody("Updated " + docCount + " documents");
                                      }
                                      else if (responseOptions.continuation) {
                                          // Else if the query came back empty, but with a continuation token; 
                                          // repeat the query w/ the token.
                                        tryQueryAndUpdate(responseOptions.continuation);
                                      } else {
                                             throw new Error("Document not found.");
                                             }
                            });

        if (!isAccepted) {
            throw new Error("The stored procedure timed out");
        }
    }

    function tryUpdate(document) {
        //Optimistic concurrency control via HTTP ETag.
        var requestOptions = { etag: document._etag };

        //Update statement goes here:
        document.details.x = "some new value";

        var isAccepted = collection
                         .replaceDocument(document._self,
                                          document,
                                          requestOptions,
                                          function replaceCallback(err, updatedDocument, responseOptions) {
                                                   if (err) throw err;
                                                   counter++;
                                           });

        // If we hit execution bounds - throw an exception.
        if (!isAccepted) {
            throw new Error("The stored procedure timed out");
        }
    }
}

I got the rough outline for this code from Andrew Liu on GitHub.

This outline should be close to what you need to do.

like image 157
cnaegle Avatar answered Oct 17 '22 18:10

cnaegle


DocumentDB has no way in a single query to update a bunch of documents. However, the portal does have a Script Explorer that allows you to write and execute a stored procedure against a single collection. Here is an example sproc that combines a query with a replaceDocument command to update some documents that you could use as a starting point for writing your own. The one gotcha to keep in mind is that DocumentDB will not allow sprocs to run longer than 5 seconds (with some buffer). So you may have to run your sproc multiple times and keep track of what you've already done if it can't complete in one 5 second run. The use of IS_DEFINED(collection.field.subfield) != true (thanks @cnaegle) in your query followed up by a document replacement that defines that field (or removes that document) should allow you to run the sproc as many times as necessary.

If you didn't want to write a sproc, the easiest thing to do would be to export the database using the DocumentDB Data Migration tool. Import that into Excel to manipulate or write a script to do the manipulation. Then upload it again using the Data Migration tool.

like image 6
Larry Maccherone Avatar answered Oct 17 '22 19:10

Larry Maccherone