MongoDB and PyMongo both supports bulk write or inserting multiple documents at once. MongoDB:
db.collection_name.insertMany()
PyMongo:
collection.insert([list_of_objects])
But I couldn't find anything similar in MongoEngine for the same purpose. There are multiple approaches but all inserts one item at a time. So is there really nothing similar like this? Since mongoengine is built top on PyMongo.
My requirement is that I have huge data data to insert at a time but since processing every document takes time so that I have to do blind insert for performance. PyMongo has the functionality to do that so if mongoengine don't have anything similar is it possible to use the pymongo instance of mongoengine for this only?
In MongoDB the db. collection. bulkWrite() method performs multiple write operations with controls for order of execution.
Both PyMongo and MongoEngine can be used to access data from a MongoDB database. However, they work in very different ways and offer different features. PyMongo is the MongoDB recommended library. It makes it easy to use MongoDB documents and maps directly to the familiar MongoDB Query Language.
bulkWrite() method provides the ability to perform bulk insert, update, and delete operations. MongoDB also supports bulk insert through the db. collection.
For bulk insert, you've got 2 options:
1) Pymongo
If your dict
's are formatted in the exact shape as they should be stored, then use pymongo, you'll get much better performance as you'll save on the overhead of the ORM/ODM library (objects instantiation, validation, etc).
As stated in the comments, you can access the pymongo.Collection
that's behind a Model class with Model._get_collection()
.
The Added value is performance, the downside is that if any documents is malformed (e.g missing field, missing default value, wrong type, additional field, etc), it will get inserted anyway since you are bypassing MongoEngine. And you may have surprises later on when interacting with the data through your Model.
2) MongoEngine
If you have an array of Model instances, then you can do the bulk insert in MongoEngine
using:
Model.objects.insert(your_array)
If you can construct your object with Model(**dict).save
, then that means that you can do
class Person(Document):
name = StringField()
age = IntField(default=32)
array = [{'name': 'John'}, {'name': 'Hulk', 'age': 100}]
person_instances = [Person(**data) for data in array]
Person.objects.insert(person_instances, load_bulk=False)
# Would insert the following
#[{'_id': ObjectId('...'), 'age': 32, 'name': 'John'},
# {'_id': ObjectId('...'), 'age': 100, 'name': 'Hulk'}]
Advantage is that it guarantees that the format of the documents you insert are valid with your MongoEngine Model (in my example that means accounting for the default value of age
when its not in the dict). The downside is that there is a performance cost.
In short, it all depends if your primary need is performance or if you can live with MongoEngine's overhead.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With