New to mongo/pymongo. Currently using the latest - v3.2.2
It looks as if insert_many is not performing as intended? I've noticed that even when supplying a generator to db.col.insert_many, memory usage still spikes (which makes inserting millions of documents difficult - though I do realize that sys.mem should be > collection size for best performance, so in reality perhaps this is nothing I should worry about?
I was under the impression that if you pass a generator to insert_many that pymongo will 'buffer' the insert into 16 or 32mb 'chunks'?
Performing this buffering/chunking manually solves the issue...
See below:
Example1 = straight insert_many (high memory usage - 2.625 GB)
Example2 = 'buffered' insert_many (expected [low] memory usage - ~300 MB)
import itertools
from itertools import chain,islice
import pymongo
client = pymongo.MongoClient()
db=client['test']
def generate_kv(N):
for i in range(N):
yield {'x': i}
print "example 1"
db.testcol.drop()
db.testcol.insert_many(generate_kv(5000000))
def chunks(iterable, size=10000):
iterator = iter(iterable)
for first in iterator:
yield chain([first], islice(iterator, size - 1))
print "example 2"
db.testcol.drop()
for c in chunks(generate_kv(5000000)):
db.testcol.insert_many(c)
Any ideas? Bug? Am I using this wrong?
insertMany() can insert multiple documents into a collection. Pass an array of documents to the method. If the documents do not specify an _id field, MongoDB adds the _id field with an ObjectId value to each document.
You are allowed to insert multiple documents in the collection by using db. collection. insertMany() method. insertMany() is a mongo shell method, which can insert multiple documents.
Multiple documents can be inserted at a time in MongoDB using bulk insert operation where an array of documents is passed to the insert method as parameter.
I think that happens because for insert_many
pymongo need to have a complete list
of operations, not iterable
. After this list will be sent to MongoDB and after that, it will be processing.
insert
.insert_many
).This is normal behavior for databases.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With