I heard that large batch sizes don't really give any additional performance
what is the optimum?
Multiple documents can be inserted at a time in MongoDB using bulk insert operation where an array of documents is passed to the insert method as parameter.
bulkWrite() method provides the ability to perform bulk insert, update, and delete operations. MongoDB also supports bulk insert through the db. collection.
A document in MongoDB is a data structure with JSON-like objects having field and value pairs. In order to insert documents into a MongoDB collection, we can use different methods such as insert(), insertOne() and insertMany().
If you call Insert to insert documents one at a time there is a network round trip for each document. If you call InsertBatch to insert documents in batches there is a network round trip for each batch instead of for each document. InsertBatch is more efficient than Insert because it reduces the number of network round trips.
Suppose you had to insert 1,000,000 documents, you could analyze the number of network round trips for different batch sizes:
So you see that even a batch size as small as 10 has already eliminated 90% of the network round trips, and a batch size of 100 has eliminated 99% of the network round trips.
This is a somewhat simplified analysis because it ignores the fact that as the batch sizes increase so do the message sizes, but it's more or less accurate.
I don't think that there is any one optimum batch size. I would say that larger batches are more performant, but once you have 10-100 documents per batch there will be very small performance improvements with larger batches.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With