Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Do bulk inserts/update in MongoDB with PyMongo

How do I bulk update/insert in mongoDb with pymongo/pandas. The error I get is batch op errors occurred I reason I get is because I set the "_id", which I want to do. I code runs fine on first run, but on second run it fails. I want to use pandas in workflow. The data does have a datetime object.

The syntax is completely different for upsert = True, with Update. An efficient solution with update would be helpful, where "_id" or "qid" could be set. But, there are python datetime objects!

InSQL   = 'SELECT * from  database2.table2 '
sqlOut  = pd.read_sql(InSQL,cxn)
sqlOut['_id'] = "20170101" + ":"+ sqlOut['Var']   

dfOut   = sqlOut.to_json(orient='records',date_format='iso' )
try:
    db["test"].insert_many(json.loads(dfOut))
except Exception as e:  print e

I have given a 50pt bounty, which expired, with no answer. Hmm...

like image 271
Merlin Avatar asked Feb 22 '17 17:02

Merlin


People also ask

What is bulk insert in MongoDB?

Multiple documents can be inserted at a time in MongoDB using bulk insert operation where an array of documents is passed to the insert method as parameter.

How do you update a collection in PyMongo?

Update Collection You can update a record, or document as it is called in MongoDB, by using the update_one() method. The first parameter of the update_one() method is a query object defining which document to update. Note: If the query finds more than one record, only the first occurrence is updated.

Is PyMongo the same as MongoDB?

PyMongo is MongoDB's official native driver for Python. It's a library that lets you connect to a MongoDB database and query the data stored using the MongoDB Query API. It is the recommended way to interface with the document database.


2 Answers

You get an error because you try to insert documents with fields which conflict with that of existing documents on the second and subsequent insert_many calls. You correctly inferred it may be due to your setting _id explicitly, which would then conflict with existing _id values in the collection.

MongoDB automatically creates an unique index on _id, which forbids duplicating values.

You need to update or replace your documents on calls after the first one (which inserted the documents in their first version). There is indeed a concept of "upsert" which will take care of inserting non-previously-existing documents in the collection as well as updating the existing ones.

Your options:

  • Most efficient: pymongo.collection.Collection.bulk_write

    import pymongo
    
    operations = [pymongo.operations.ReplaceOne(
        filter={"_id": doc["_id"]}, 
        replacement=doc,
        upsert=True
        ) for doc in json.loads(dfOut)]
    
    result = db["test"].bulk_write(operations)
    # handle results
    

Note that it's efficiency also depends on whether the field is indexed in the collection, which incidentally is the case for _id. (also see pymongo.operations.ReplaceOne)

  • Loop over your collection and calling pymongo.collection.Collection.update_one or pymongo.collection.Collection.replace_one(inefficient because not bulk)

    import pymongo
    
    results = []
    for doc in json.load(dfOut):
        result = db["test"].replace_one(
            filter={"_id": doc["_id"]},
            replacement=doc,
            upsert=True
            )
        results.append(result)
    
    # handle results
    

Note: pymongo.collection.Collection.update_many seems unsuitable for your needs since you are not trying to set the same value on all matches of a given filter.

like image 112
korrigan Avatar answered Oct 17 '22 14:10

korrigan


batch op error maybe caused by duplicate _id, So delete the same _id documents already in mongo before inserting

Or use update_many https://api.mongodb.com/python/current/api/pymongo/collection.html?highlight=update#pymongo.collection.Collection.update_many

https://docs.mongodb.com/manual/reference/method/db.collection.updateMany/

like image 31
Kun Avatar answered Oct 17 '22 13:10

Kun