Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MongoDB bulkWrite multiple updateOne vs updateMany

I have cases where I build bulkWrite operations where some documents have the same update object, is there any performance benefit to merging the filters and send one updateMany with those filters instead of multiple updateOnes in the same bulkWrite?

It's obviously better to use updateMany over multiple updateOnes when using the normal methods, but with bulkWrite, since it's a single command, are there any significant gains of preferring one over the other?

Example:

I have 200k documents that I need to update, I have 10 total unique status field for all 200K documents, so my options are:

Solutions:

A) Send one single bulkWrite with 10 updateMany operations, and each one of those operations will affect 20K documents.

B) Send one single bulkWrite with 200K updateOne each operations holding its filter and status.

As @AlexBlex noted, I have to look out for accidentally updating more than one document with the same filter, in my case I use _id as my filter, so accidentally updating other documents is not a concern in my case, but is definitely something to look out for when considering the updateMany option.

Thanks @AlexBlex.

like image 842
Hafez Avatar asked Dec 05 '22 08:12

Hafez


1 Answers

Short answer:

Using updateMany is at least twice faster, but might accidentally update more documents than you intended, keep reading to learn how to avoid this and gain the performance benefits.

Long answer:

We ran the following experiment to know the answer for that, the following are the steps:

  1. Create a bankaccounts mongodb collection, each document contains only one field (balance).
  2. Insert 1 million documents into the bankaccounts collection.
  3. Randomize the order in memory of all 1 million documents to avoid any possible optimizations from the database using ids that are inserted in the same sequence, simulating a real-world scenario.
  4. Build write operations for bulkWrite from the documents with a random number between 0 and 100.
  5. Execute the bulkWrite.
  6. Log the time the bulkWrite took.

Now, the experiment lies in the 4th step.

In one variation of the experiment we build an array consisting of 1 million updateOne operations, each updateOne has filter for a single document, and its respective `update object.

In the second variation, we build 100 updateMany operations, each including filter for 10K documents ids, and their respective update.

Results: updateMany with multiple documents ids is 243% faster than multiple updateOnes, this can not be used everywhere though, please read "The risk" section to learn when it should be used.

Details: We ran the script 5 times for each variation, the detailed results are as follows: With updateOne: 51.28 seconds on average. With updateMany: 21.04 seconds on average.

The risk: As many people have already pointed out, updateMany is not a direct substitute to updateOne, since it can incorrectly update multiple documents when our intention was to really update only one document. This approach is only valid when you're using a field that is unique such as _id or any other field that is unique, if the filter is depending on fields that are not unique, multiple documents will be updated and the results will not be equivalent.

65831219.js

// 65831219.js
'use strict';
const mongoose = require('mongoose');
const { Schema } = mongoose;

const DOCUMENTS_COUNT = 1_000_000;
const UPDATE_MANY_OPERATIONS_COUNT = 100;
const MINIMUM_BALANCE = 0;
const MAXIMUM_BALANCE = 100;
const SAMPLES_COUNT = 10;

const bankAccountSchema = new Schema({
  balance: { type: Number }
});

const BankAccount = mongoose.model('BankAccount', bankAccountSchema);

mainRunner().catch(console.error);

async function mainRunner () {
  for (let i = 0; i < SAMPLES_COUNT; i++) {
    await runOneCycle(buildUpdateManyWriteOperations).catch(console.error);
    await runOneCycle(buildUpdateOneWriteOperations).catch(console.error);
    console.log('-'.repeat(80));
  }
  process.exit(0);
}

/**
 *
 * @param {buildUpdateManyWriteOperations|buildUpdateOneWriteOperations} buildBulkWrite
 */
async function runOneCycle (buildBulkWrite) {
  await mongoose.connect('mongodb://localhost:27017/test', {
    useNewUrlParser: true,
    useUnifiedTopology: true
  });

  await mongoose.connection.dropDatabase();

  const { accounts } = await createAccounts({ accountsCount: DOCUMENTS_COUNT });

  const { writeOperations } = buildBulkWrite({ accounts });

  const writeStartedAt = Date.now();

  await BankAccount.bulkWrite(writeOperations);

  const writeEndedAt = Date.now();

  console.log(`Write operations took ${(writeEndedAt - writeStartedAt) / 1000} seconds with \`${buildBulkWrite.name}\`.`);
}



async function createAccounts ({ accountsCount }) {
  const rawAccounts = Array.from({ length: accountsCount }, () => ({ balance: getRandomInteger(MINIMUM_BALANCE, MAXIMUM_BALANCE) }));
  const accounts = await BankAccount.insertMany(rawAccounts);

  return { accounts };
}

function buildUpdateOneWriteOperations ({ accounts }) {
  const writeOperations = shuffleArray(accounts).map((account) => ({
    updateOne: {
      filter: { _id: account._id },
      update: { balance: getRandomInteger(MINIMUM_BALANCE, MAXIMUM_BALANCE) }
    }
  }));

  return { writeOperations };
}

function buildUpdateManyWriteOperations ({ accounts }) {
  shuffleArray(accounts);
  const accountsChunks = chunkArray(accounts, accounts.length / UPDATE_MANY_OPERATIONS_COUNT);
  const writeOperations = accountsChunks.map((accountsChunk) => ({
    updateMany: {
      filter: { _id: { $in: accountsChunk.map(account => account._id) } },
      update: { balance: getRandomInteger(MINIMUM_BALANCE, MAXIMUM_BALANCE) }
    }
  }));

  return { writeOperations };
}


function getRandomInteger (min = 0, max = 1) {
  min = Math.ceil(min);
  max = Math.floor(max);
  return min + Math.floor(Math.random() * (max - min + 1));
}

function shuffleArray (array) {
  let currentIndex = array.length;
  let temporaryValue;
  let randomIndex;

  // While there remain elements to shuffle...
  while (0 !== currentIndex) {

    // Pick a remaining element...
    randomIndex = Math.floor(Math.random() * currentIndex);
    currentIndex -= 1;

    // And swap it with the current element.
    temporaryValue = array[currentIndex];
    array[currentIndex] = array[randomIndex];
    array[randomIndex] = temporaryValue;
  }

  return array;
}

function chunkArray (array, sizeOfTheChunkedArray) {
  const chunked = [];

  for (const element of array) {
    const last = chunked[chunked.length - 1];

    if (!last || last.length === sizeOfTheChunkedArray) {
      chunked.push([element]);
    } else {
      last.push(element);
    }
  }
  return chunked;
}

Output

$ node 65831219.js
Write operations took 20.803 seconds with `buildUpdateManyWriteOperations`.
Write operations took 50.84 seconds with `buildUpdateOneWriteOperations`.
----------------------------------------------------------------------------------------------------

Tests were run using MongoDB version 4.0.4.

like image 151
Hafez Avatar answered Dec 28 '22 06:12

Hafez