Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Function insertMany() unordered: proper way to get both the errors and the result?

Tags:

It seems that MongoDB insertMany() function with ordered option set to false can insert documents more efficiently than with ordered option set to true. And it can continue inserting documents even if several documents fail to insert.

But I found that there is no clean way to get both each failed document's error and overall command result.

(BTW, I'm using Node.js driver API 2.2. I will refer the driver's source code from now on: http://mongodb.github.io/node-mongodb-native/2.2/api/lib_collection.js.html)

First, If Promise is used, there is no way to get both the errors and the result. On source code line 540, insertMany() returns either the error or the result - not both, while bulkWrite() callback returns both on source code line 703.

Second, If callback is used, things get worse. When bulkWrite() calls the callback with both the error and the result, insertMany() calls the callback with both the error and the result, but the result is the BulkWrite's result, not the properly converted one to InsertManyResults. Please see the source code line 535. I think this is a bug.

And even for bulkWrite(), when the number of the errors is 1, it does not convert the result properly to its format. Please see the source code line 669. I think this is also a bug.

Now I think that the Node.js driver is simply not prepared to handle the case.

For now, it seems that there is no way to get both the errors and results properly.

Am I right?

UPDATE

I've run a test code which is based on Neil Lunn's answer. Since my Node.js(4.4.5) does not understand async/await, I had to rewrite the test code with explicit Promises.

The test code is as follows:

function doTest()
{
    var MongoClient = require('mongodb').MongoClient;
    var testData = [ 1,2,2,3,3,4,5,6,7,8,9 ];
    var db;

    return MongoClient.connect('mongodb://127.0.0.1/test')
    .then(function (_db)
    {
        db = _db;
        return db.createCollection('test');
    })
    .then(function ()
    {
        return db.collection('test').deleteMany({})
        .then(function ()
        {
            return db.collection('test').insertMany(
                testData.map(function (_id)
                {
                    return { _id: _id };
                }),
                { ordered: false })
            .then(function (result)
            {
                console.log('Promise: result', result);
            }, function (err)
            {
                console.log('Promise: error', err);
            });
        })
        .then(function ()
        {
            return db.collection('test').deleteMany({});
        })
        .then(function ()
        {
            return new Promise(function (resolve, reject)
            {
                return db.collection('test').insertMany(
                    testData.map(function (_id)
                    {
                        return { _id: _id };
                    }),
                    { ordered: false },
                    function (err, result)
                {
                    console.log('callback: error', err);
                    console.log('callback: result', result);
                    console.log('callback: result.hasWriteErrors', result.hasWriteErrors());
                    console.log('callback: result.getWriteErrors',
                        JSON.stringify(result.getWriteErrors(), null, 2));
                    resolve();
                });
            });
        });
    })
    .catch(function (err)
    {
        console.log('catch', err);
    })
    .then(function ()
    {
        db.close();
    });
}
doTest();

And here is the result:

Promise: error { [MongoError: write operation failed]
  name: 'MongoError',
  message: 'write operation failed',
  driver: true,
  code: 11000,
  writeErrors: 
   [ { code: [Getter],
       index: [Getter],
       errmsg: [Getter],
       getOperation: [Function],
       toJSON: [Function],
       toString: [Function] },
     { code: [Getter],
       index: [Getter],
       errmsg: [Getter],
       getOperation: [Function],
       toJSON: [Function],
       toString: [Function] } ] }
callback: error { [MongoError: write operation failed]
  name: 'MongoError',
  message: 'write operation failed',
  driver: true,
  code: 11000,
  writeErrors: 
   [ { code: [Getter],
       index: [Getter],
       errmsg: [Getter],
       getOperation: [Function],
       toJSON: [Function],
       toString: [Function] },
     { code: [Getter],
       index: [Getter],
       errmsg: [Getter],
       getOperation: [Function],
       toJSON: [Function],
       toString: [Function] } ] }
callback: result { ok: [Getter],
  nInserted: [Getter],
  nUpserted: [Getter],
  nMatched: [Getter],
  nModified: [Getter],
  nRemoved: [Getter],
  getInsertedIds: [Function],
  getUpsertedIds: [Function],
  getUpsertedIdAt: [Function],
  getRawResponse: [Function],
  hasWriteErrors: [Function],
  getWriteErrorCount: [Function],
  getWriteErrorAt: [Function],
  getWriteErrors: [Function],
  getLastOp: [Function],
  getWriteConcernError: [Function],
  toJSON: [Function],
  toString: [Function],
  isOk: [Function],
  insertedCount: 9,
  matchedCount: 0,
  modifiedCount: 0,
  deletedCount: 0,
  upsertedCount: 0,
  upsertedIds: {},
  insertedIds: 
   { '0': 1,
     '1': 2,
     '2': 2,
     '3': 3,
     '4': 3,
     '5': 4,
     '6': 5,
     '7': 6,
     '8': 7,
     '9': 8,
     '10': 9 },
  n: 9 }
callback: result.hasWriteErrors true
callback: result.getWriteErrors [
  {
    "code": 11000,
    "index": 2,
    "errmsg": "E11000 duplicate key error collection: test.test index: _id_ dup key: { : 2 }",
    "op": {
      "_id": 2
    }
  },
  {
    "code": 11000,
    "index": 4,
    "errmsg": "E11000 duplicate key error collection: test.test index: _id_ dup key: { : 3 }",
    "op": {
      "_id": 3
    }
  }
]

Now, I've run the code again, with the testData variable modified as follows:

var testData = [ 1,2,3,3,4,5,6,7,8,9 ];

In this case, the number of errors will be 1, instead of 2, since the duplicate '2' is removed.

And here is the result:

Promise: error { [MongoError: E11000 duplicate key error collection: test.test index: _id_ dup key: { : 3 }]
  name: 'MongoError',
  message: 'E11000 duplicate key error collection: test.test index: _id_ dup key: { : 3 }',
  driver: true,
  code: 11000,
  index: 3,
  errmsg: 'E11000 duplicate key error collection: test.test index: _id_ dup key: { : 3 }',
  getOperation: [Function],
  toJSON: [Function],
  toString: [Function] }
callback: error { [MongoError: E11000 duplicate key error collection: test.test index: _id_ dup key: { : 3 }]
  name: 'MongoError',
  message: 'E11000 duplicate key error collection: test.test index: _id_ dup key: { : 3 }',
  driver: true,
  code: 11000,
  index: 3,
  errmsg: 'E11000 duplicate key error collection: test.test index: _id_ dup key: { : 3 }',
  getOperation: [Function],
  toJSON: [Function],
  toString: [Function] }
callback: result { ok: [Getter],
  nInserted: [Getter],
  nUpserted: [Getter],
  nMatched: [Getter],
  nModified: [Getter],
  nRemoved: [Getter],
  getInsertedIds: [Function],
  getUpsertedIds: [Function],
  getUpsertedIdAt: [Function],
  getRawResponse: [Function],
  hasWriteErrors: [Function],
  getWriteErrorCount: [Function],
  getWriteErrorAt: [Function],
  getWriteErrors: [Function],
  getLastOp: [Function],
  getWriteConcernError: [Function],
  toJSON: [Function],
  toString: [Function],
  isOk: [Function] }
callback: result.hasWriteErrors true
callback: result.getWriteErrors [
  {
    "code": 11000,
    "index": 3,
    "errmsg": "E11000 duplicate key error collection: test.test index: _id_ dup key: { : 3 }",
    "op": {
      "_id": 3
    }
  }
]

The format of error and result is fairly different from the first run.

  • the error has no writeErrors field.
  • the result has no 'converted' fields. (insertedCount, matchedCount, etc.) This is a 'bug' on the driver source code line 669 as I've told above.

And in both test runs, the types of the result argument are NOT Collection~insertWriteOpResult. The first one is Collection~bulkWriteOpCallback, and the second one is more internal one. So, the API document is wrong in this case. This is caused by the 'bugs' on line 535 and 669 as I've told above.

So, even if the result can be used (indeed, the result has hasWriteErrors() and getWriteErrors() as Neil Lunn told), since this behavior is not documented, I suspect that it can be changed on later versions without notice, and my code will break.

like image 991
zeodtr Avatar asked Oct 20 '17 01:10

zeodtr


People also ask

What does insertMany return?

The insertMany() method returns a document that contains: The acknowledged key sets to true if operation executed with a write concern or false if the write concern was disabled. An array of _id values of successfully inserted documents.

What is the difference between insert and insertMany in MongoDB?

With insertOne you can insert one document into the collection. insertMany accepts an array of documents and these are inserted. The insert (the method from older versions) takes a single document by default, and there is an option to insert multiple documents supplied as an array.

What is command to insert many document in a collection?

To insert multiple documents, use db. collection. insertMany() .


1 Answers

The issue is really only with how a "Promise" resolves and how the error information is passed, but of course the real core issue is that there is in fact a return of "both" the error and result information when any "Bulk" operation is set to { ordered: false }. This is being addressed in the driver for the 3.x release as noted on NODE-1158 which also contains links to the commits in the future branch that fix the issue.

The "workaround" for this is to be aware that "both" the result an error information are present in the BulkWriteResult object returned as the result in the "callback" invocation of any such methods ( note insertMany() and even bulkWrite() actually wrap an underlying Bulk API Implementation ).

To demonstrate with a listing:

const MongoClient = require('mongodb').MongoClient;

const uri = 'mongodb://localhost/test';
const testData = [1,2,3,3,4,5,6,6,7,8,9];

(async function() {

  let db;

  try {

    db = await MongoClient.connect(uri);

    await db.collection('test').remove();

    // Expect an error here - but it's just the errors
    try {
      let result = await db.collection('test').insertMany(
        testData.map( _id => ({ _id }) ),
        { "ordered": false }
      );
      console.log(result);   // never gets here
    } catch(e) {
      console.dir(e);
      console.log(JSON.stringify(e.writeErrors,undefined,2));
    }

    await db.collection('test').remove();
    // Wrapped callback so we see what happens

    try {
      let result = await new Promise((resolve,reject) => 
        db.collection('test').insertMany(
          testData.map( _id => ({ _id }) ),
          { "ordered": false },
          (err,result) => {
            if (err) reject(result);    // Because the errors are here as well
            resolve(result);
          }
        )
      );
      console.log(result);  // Never gets here
    } catch(e) {
      console.dir(e);
      console.log(e.hasWriteErrors());
      console.log(JSON.stringify(e.getWriteErrors(),undefined,2));
    }

  } catch(e) {
    console.error(e);
  } finally {
    db.close();
  }

})();

So there are two blocks of code there that attempt to use insertMany() with a list of values that are going to produce duplicate key errors for some of the values.

In the first attempt, we use the default Promise return which as should be indicated by the implemented code of the driver is simply going to pass the err callback result in the method it wraps into it's reject() statement. This means we go to the catch block here and produce the error information as output:

{ MongoError: [object Object]
    at Function.MongoError.create (/home/neillunn/projects/bulkerror/node_modules/mongodb-core/lib/error.js:31:11)
    at toError (/home/neillunn/projects/bulkerror/node_modules/mongodb/lib/utils.js:139:22)
    at /home/neillunn/projects/bulkerror/node_modules/mongodb/lib/collection.js:701:23
    at handleCallback (/home/neillunn/projects/bulkerror/node_modules/mongodb/lib/utils.js:120:56)
    at /home/neillunn/projects/bulkerror/node_modules/mongodb/lib/bulk/unordered.js:465:9
    at handleCallback (/home/neillunn/projects/bulkerror/node_modules/mongodb/lib/utils.js:120:56)
    at resultHandler (/home/neillunn/projects/bulkerror/node_modules/mongodb/lib/bulk/unordered.js:413:5)
    at /home/neillunn/projects/bulkerror/node_modules/mongodb-core/lib/connection/pool.js:469:18
    at _combinedTickCallback (internal/process/next_tick.js:131:7)
    at process._tickCallback (internal/process/next_tick.js:180:9)
  name: 'MongoError',
  message: 'write operation failed',
  driver: true,
  code: 11000,
  writeErrors:
   [ WriteError {
       code: [Getter],
       index: [Getter],
       errmsg: [Getter],
       getOperation: [Function],
       toJSON: [Function],
       toString: [Function] },
     WriteError {
       code: [Getter],
       index: [Getter],
       errmsg: [Getter],
       getOperation: [Function],
       toJSON: [Function],
       toString: [Function] } ] }
[
  {
    "code": 11000,
    "index": 3,
    "errmsg": "E11000 duplicate key error collection: test.test index: _id_ dup key: { : 3 }",
    "op": {
      "_id": 3
    }
  },
  {
    "code": 11000,
    "index": 7,
    "errmsg": "E11000 duplicate key error collection: test.test index: _id_ dup key: { : 6 }",
    "op": {
      "_id": 6
    }
  }
]

Note that this is a wrapped MongoError and despite the fact we set { ordered: false } there is no "result" information in the response. We can see on detailed inspection of the error information that the list of WriteError does have details on each duplicate key error produced.

So there was successful writes for everything in the batch that did not throw an error, but it's not reported in anything obtainable from the Promise. But this is not true of the underlying method, which is still implemented using a callback.

The second attempt wraps this callback "manually", so we can in fact see the result by altering the behavior and passing the result object to reject when an err is present. This tells us a different story:

BulkWriteResult {
  ok: [Getter],
  nInserted: [Getter],
  nUpserted: [Getter],
  nMatched: [Getter],
  nModified: [Getter],
  nRemoved: [Getter],
  getInsertedIds: [Function],
  getUpsertedIds: [Function],
  getUpsertedIdAt: [Function],
  getRawResponse: [Function],
  hasWriteErrors: [Function],
  getWriteErrorCount: [Function],
  getWriteErrorAt: [Function],
  getWriteErrors: [Function],
  getLastOp: [Function],
  getWriteConcernError: [Function],
  toJSON: [Function],
  toString: [Function],
  isOk: [Function],
  insertedCount: 9,
  matchedCount: 0,
  modifiedCount: 0,
  deletedCount: 0,
  upsertedCount: 0,
  upsertedIds: {},
  insertedIds:
   { '0': 1,
     '1': 2,
     '2': 3,
     '3': 3,
     '4': 4,
     '5': 5,
     '6': 6,
     '7': 6,
     '8': 7,
     '9': 8,
     '10': 9 },
  n: 9 }
true
[
  {
    "code": 11000,
    "index": 3,
    "errmsg": "E11000 duplicate key error collection: test.test index: _id_ dup key: { : 3 }",
    "op": {
      "_id": 3
    }
  },
  {
    "code": 11000,
    "index": 7,
    "errmsg": "E11000 duplicate key error collection: test.test index: _id_ dup key: { : 6 }",
    "op": {
      "_id": 6
    }
  }
]

Since we did not pass back the err we now see the BulkWriteResult in the catch block. We know we got there because of the specific code we are running in that block to inspect the result.

The regular result does indeed have some things as the modified or inserted count, as well as the list of insertedIds. We can also see from inspection that hasWriteErrors() returns true, and we can obtain the list of WriteError which we have serialized for better viewing.

Fixed in 3.x

As noted on the linked issue, the actual fix will only be present in the 3.x driver release which will support MongoDB 3.6. The "fix" is essentially done at a "lower level" to not return a BulkWriteResult at all, but instead make err return a BulkWriteError.

This actually makes things more consistent with how some other drivers have correctly implemented this already. It's honestly a bit of a "hangover" from traditional "node style" callbacks, where "both" error and response are always returned.

So bringing this into "just an error" makes things more consistent and works as you would generally expect.

As a bit of a side note, the related issue in MongoDB Node.js native driver silently swallows bulkWrite exception. referenced in the JIRA issue shows the actual bulkWrite() method in the implementation ( which is not "directly" what insertMany() wraps ) has a slightly different problem in that "no error" actually gets thrown at all, because the code is expecting result to be null and as described it is not.

So the reverse case is true there where we never get to catch for the exception in the default implementation using a Promise. However the exact same method of handling is what should be applied, by manually wrapping the callback and sending the result through reject in preference to err when both those are returned as not null.

Work around them as described, and preferably move to the new driver as soon as it becomes available. Which should be fairly soon anyway.

like image 119
Neil Lunn Avatar answered Oct 11 '22 12:10

Neil Lunn