I have cases where I build bulkWrite operations where some documents have the same update
object, is there any performance benefit to merging the filters and send one updateMany
with those filters instead of multiple updateOne
s in the same bulkWrite
?
It's obviously better to use updateMany
over multiple updateOne
s when using the normal methods, but with bulkWrite, since it's a single command, are there any significant gains of preferring one over the other?
Example:
I have 200k documents that I need to update, I have 10 total unique status
field for all 200K documents, so my options are:
Solutions:
A) Send one single bulkWrite with 10 updateMany
operations, and each one of those operations will affect 20K documents.
B) Send one single bulkWrite with 200K updateOne
each operations holding its filter and status
.
As @AlexBlex noted, I have to look out for accidentally updating more than one document with the same filter, in my case I use _id
as my filter, so accidentally updating other documents is not a concern in my case, but is definitely something to look out for when considering the updateMany
option.
Thanks @AlexBlex.
Short answer:
Using updateMany
is at least twice faster, but might accidentally update more documents than you intended, keep reading to learn how to avoid this and gain the performance benefits.
Long answer:
We ran the following experiment to know the answer for that, the following are the steps:
Now, the experiment lies in the 4th step.
In one variation of the experiment we build an array consisting of 1 million updateOne
operations, each updateOne
has filter
for a single document, and its respective `update object.
In the second variation, we build 100 updateMany
operations, each including filter
for 10K documents ids, and their respective update
.
Results:
updateMany
with multiple documents ids is 243% faster than multiple updateOne
s, this can not be used everywhere though, please read "The risk" section to learn when it should be used.
Details: We ran the script 5 times for each variation, the detailed results are as follows: With updateOne: 51.28 seconds on average. With updateMany: 21.04 seconds on average.
The risk:
As many people have already pointed out, updateMany
is not a direct substitute to updateOne
, since it can incorrectly update multiple documents when our intention was to really update only one document.
This approach is only valid when you're using a field that is unique such as _id
or any other field that is unique, if the filter is depending on fields that are not unique, multiple documents will be updated and the results will not be equivalent.
65831219.js
// 65831219.js
'use strict';
const mongoose = require('mongoose');
const { Schema } = mongoose;
const DOCUMENTS_COUNT = 1_000_000;
const UPDATE_MANY_OPERATIONS_COUNT = 100;
const MINIMUM_BALANCE = 0;
const MAXIMUM_BALANCE = 100;
const SAMPLES_COUNT = 10;
const bankAccountSchema = new Schema({
balance: { type: Number }
});
const BankAccount = mongoose.model('BankAccount', bankAccountSchema);
mainRunner().catch(console.error);
async function mainRunner () {
for (let i = 0; i < SAMPLES_COUNT; i++) {
await runOneCycle(buildUpdateManyWriteOperations).catch(console.error);
await runOneCycle(buildUpdateOneWriteOperations).catch(console.error);
console.log('-'.repeat(80));
}
process.exit(0);
}
/**
*
* @param {buildUpdateManyWriteOperations|buildUpdateOneWriteOperations} buildBulkWrite
*/
async function runOneCycle (buildBulkWrite) {
await mongoose.connect('mongodb://localhost:27017/test', {
useNewUrlParser: true,
useUnifiedTopology: true
});
await mongoose.connection.dropDatabase();
const { accounts } = await createAccounts({ accountsCount: DOCUMENTS_COUNT });
const { writeOperations } = buildBulkWrite({ accounts });
const writeStartedAt = Date.now();
await BankAccount.bulkWrite(writeOperations);
const writeEndedAt = Date.now();
console.log(`Write operations took ${(writeEndedAt - writeStartedAt) / 1000} seconds with \`${buildBulkWrite.name}\`.`);
}
async function createAccounts ({ accountsCount }) {
const rawAccounts = Array.from({ length: accountsCount }, () => ({ balance: getRandomInteger(MINIMUM_BALANCE, MAXIMUM_BALANCE) }));
const accounts = await BankAccount.insertMany(rawAccounts);
return { accounts };
}
function buildUpdateOneWriteOperations ({ accounts }) {
const writeOperations = shuffleArray(accounts).map((account) => ({
updateOne: {
filter: { _id: account._id },
update: { balance: getRandomInteger(MINIMUM_BALANCE, MAXIMUM_BALANCE) }
}
}));
return { writeOperations };
}
function buildUpdateManyWriteOperations ({ accounts }) {
shuffleArray(accounts);
const accountsChunks = chunkArray(accounts, accounts.length / UPDATE_MANY_OPERATIONS_COUNT);
const writeOperations = accountsChunks.map((accountsChunk) => ({
updateMany: {
filter: { _id: { $in: accountsChunk.map(account => account._id) } },
update: { balance: getRandomInteger(MINIMUM_BALANCE, MAXIMUM_BALANCE) }
}
}));
return { writeOperations };
}
function getRandomInteger (min = 0, max = 1) {
min = Math.ceil(min);
max = Math.floor(max);
return min + Math.floor(Math.random() * (max - min + 1));
}
function shuffleArray (array) {
let currentIndex = array.length;
let temporaryValue;
let randomIndex;
// While there remain elements to shuffle...
while (0 !== currentIndex) {
// Pick a remaining element...
randomIndex = Math.floor(Math.random() * currentIndex);
currentIndex -= 1;
// And swap it with the current element.
temporaryValue = array[currentIndex];
array[currentIndex] = array[randomIndex];
array[randomIndex] = temporaryValue;
}
return array;
}
function chunkArray (array, sizeOfTheChunkedArray) {
const chunked = [];
for (const element of array) {
const last = chunked[chunked.length - 1];
if (!last || last.length === sizeOfTheChunkedArray) {
chunked.push([element]);
} else {
last.push(element);
}
}
return chunked;
}
Output
$ node 65831219.js
Write operations took 20.803 seconds with `buildUpdateManyWriteOperations`.
Write operations took 50.84 seconds with `buildUpdateOneWriteOperations`.
----------------------------------------------------------------------------------------------------
Tests were run using MongoDB version 4.0.4.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With