Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Bulk Write in MongoEngine

MongoDB and PyMongo both supports bulk write or inserting multiple documents at once. MongoDB:

db.collection_name.insertMany()

PyMongo:

collection.insert([list_of_objects])

But I couldn't find anything similar in MongoEngine for the same purpose. There are multiple approaches but all inserts one item at a time. So is there really nothing similar like this? Since mongoengine is built top on PyMongo.

My requirement is that I have huge data data to insert at a time but since processing every document takes time so that I have to do blind insert for performance. PyMongo has the functionality to do that so if mongoengine don't have anything similar is it possible to use the pymongo instance of mongoengine for this only?

like image 579
Kousik Avatar asked Oct 04 '19 07:10

Kousik


People also ask

What is bulk write in MongoDB?

In MongoDB the db. collection. bulkWrite() method performs multiple write operations with controls for order of execution.

Which is better PyMongo or MongoEngine?

Both PyMongo and MongoEngine can be used to access data from a MongoDB database. However, they work in very different ways and offer different features. PyMongo is the MongoDB recommended library. It makes it easy to use MongoDB documents and maps directly to the familiar MongoDB Query Language.

What is the principal implication of using a bulk write operation in MongoDB?

bulkWrite() method provides the ability to perform bulk insert, update, and delete operations. MongoDB also supports bulk insert through the db. collection.


1 Answers

For bulk insert, you've got 2 options:

1) Pymongo

If your dict's are formatted in the exact shape as they should be stored, then use pymongo, you'll get much better performance as you'll save on the overhead of the ORM/ODM library (objects instantiation, validation, etc).

As stated in the comments, you can access the pymongo.Collection that's behind a Model class with Model._get_collection().

The Added value is performance, the downside is that if any documents is malformed (e.g missing field, missing default value, wrong type, additional field, etc), it will get inserted anyway since you are bypassing MongoEngine. And you may have surprises later on when interacting with the data through your Model.

2) MongoEngine

If you have an array of Model instances, then you can do the bulk insert in MongoEngine using:

Model.objects.insert(your_array)

If you can construct your object with Model(**dict).save, then that means that you can do

class Person(Document):
    name = StringField()
    age = IntField(default=32)

array = [{'name': 'John'}, {'name': 'Hulk', 'age': 100}]
person_instances = [Person(**data) for data in array]

Person.objects.insert(person_instances, load_bulk=False)

# Would insert the following
#[{'_id': ObjectId('...'), 'age': 32, 'name': 'John'},
# {'_id': ObjectId('...'), 'age': 100, 'name': 'Hulk'}]

Advantage is that it guarantees that the format of the documents you insert are valid with your MongoEngine Model (in my example that means accounting for the default value of age when its not in the dict). The downside is that there is a performance cost.

In short, it all depends if your primary need is performance or if you can live with MongoEngine's overhead.

like image 129
bagerard Avatar answered Oct 21 '22 14:10

bagerard