Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

insert_many with upsert - PyMongo [duplicate]

I have some data like this:

data = [{'_id': 1, 'val': 5},
        {'_id': 2, 'val': 1}}]

current data in db:

>>> db.collection.find_one()
    {'_id': 1, 'val': 3}

I always receive unique rows but am not sure if any of them already exists in DB (such as the case above). And I want to update them based on two types of requirements.

Requirement 1:

Do NOT update the rows if _id already exists. This is kinda easy in a way:

from pymongo.errors import BulkWriteError
try:
  db.collection.insert_many(data, unordered=False)
except BulkWriteError:
  pass

executing the above would insert 2nd row but won't update the first; but it also raises the exception.

1. Is there any better way of doing the above operation (for bulk inserts) ?

Requirement 2

This is similar to update_if_exists & insert if not exists combined. So the following data:

data2 = [{'_id': 1, 'val': 9},
         {'_id': 3, 'val': 4}}]

should update the row with _id=1 and insert the 2nd row in DB.

The problem is I get thousands of rows at one time and am not sure if checking and updating one-by-one is efficient.

2. Is this requirement possible in MongoDB without iterating over each row and with as few operations as possible ?

like image 487
Kevad Avatar asked May 25 '16 11:05

Kevad


1 Answers

You can generate a list of updates to pass to bulk write API that will send all the operations together but they will still be executed one by one on the server, but without causing an error.

from pymongo import UpdateOne
data2 = [{'_id': 1, 'val': 9}, {'_id': 3, 'val': 4}]
upserts=[ UpdateOne({'_id':x['_id']}, {'$setOnInsert':x}, upsert=True) for x in data2]
result = db.test.bulk_write(upserts)

You can see in the result that when _id is found the operation is a no-op, but when it's not found, it's an insert.

like image 164
Asya Kamsky Avatar answered Sep 16 '22 16:09

Asya Kamsky