I have some data like this:
data = [{'_id': 1, 'val': 5},
{'_id': 2, 'val': 1}}]
current data in db:
>>> db.collection.find_one()
{'_id': 1, 'val': 3}
I always receive unique rows but am not sure if any of them already exists in DB (such as the case above). And I want to update them based on two types of requirements.
Requirement 1:
Do NOT update the rows if _id
already exists. This is kinda easy in a way:
from pymongo.errors import BulkWriteError
try:
db.collection.insert_many(data, unordered=False)
except BulkWriteError:
pass
executing the above would insert 2nd
row but won't update the first; but it also raises the exception.
1. Is there any better way of doing the above operation (for bulk inserts) ?
Requirement 2
This is similar to update_if_exists
& insert if not exists
combined. So the following data:
data2 = [{'_id': 1, 'val': 9},
{'_id': 3, 'val': 4}}]
should update the row with _id=1
and insert the 2nd
row in DB.
The problem is I get thousands of rows at one time and am not sure if checking and updating one-by-one is efficient.
2. Is this requirement possible in MongoDB without iterating over each row and with as few operations as possible ?
You can generate a list of updates to pass to bulk write API that will send all the operations together but they will still be executed one by one on the server, but without causing an error.
from pymongo import UpdateOne
data2 = [{'_id': 1, 'val': 9}, {'_id': 3, 'val': 4}]
upserts=[ UpdateOne({'_id':x['_id']}, {'$setOnInsert':x}, upsert=True) for x in data2]
result = db.test.bulk_write(upserts)
You can see in the result that when _id is found the operation is a no-op, but when it's not found, it's an insert.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With