Do bulk inserts/update in MongoDB with PyMongo

Tags:

How do I bulk update/insert in mongoDb with pymongo/pandas. The error I get is batch op errors occurred I reason I get is because I set the "_id", which I want to do. I code runs fine on first run, but on second run it fails. I want to use pandas in workflow. The data does have a datetime object.

The syntax is completely different for upsert = True, with Update. An efficient solution with update would be helpful, where "_id" or "qid" could be set. But, there are python datetime objects!

InSQL   = 'SELECT * from  database2.table2 '
sqlOut  = pd.read_sql(InSQL,cxn)
sqlOut['_id'] = "20170101" + ":"+ sqlOut['Var']   

dfOut   = sqlOut.to_json(orient='records',date_format='iso' )
try:
    db["test"].insert_many(json.loads(dfOut))
except Exception as e:  print e

I have given a 50pt bounty, which expired, with no answer. Hmm...

271

asked Feb 22 '17 17:02

Merlin

2 Answers

You get an error because you try to insert documents with fields which conflict with that of existing documents on the second and subsequent insert_many calls. You correctly inferred it may be due to your setting _id explicitly, which would then conflict with existing _id values in the collection.

MongoDB automatically creates an unique index on _id, which forbids duplicating values.

You need to update or replace your documents on calls after the first one (which inserted the documents in their first version). There is indeed a concept of "upsert" which will take care of inserting non-previously-existing documents in the collection as well as updating the existing ones.

Your options:

Most efficient: pymongo.collection.Collection.bulk_write

import pymongo

operations = [pymongo.operations.ReplaceOne(
    filter={"_id": doc["_id"]}, 
    replacement=doc,
    upsert=True
    ) for doc in json.loads(dfOut)]

result = db["test"].bulk_write(operations)
# handle results

Note that it's efficiency also depends on whether the field is indexed in the collection, which incidentally is the case for _id. (also see pymongo.operations.ReplaceOne)

Loop over your collection and calling pymongo.collection.Collection.update_one or pymongo.collection.Collection.replace_one(inefficient because not bulk)

import pymongo

results = []
for doc in json.load(dfOut):
    result = db["test"].replace_one(
        filter={"_id": doc["_id"]},
        replacement=doc,
        upsert=True
        )
    results.append(result)

# handle results

Note: pymongo.collection.Collection.update_many seems unsuitable for your needs since you are not trying to set the same value on all matches of a given filter.

112

answered Oct 17 '22 14:10

korrigan

batch op error maybe caused by duplicate _id, So delete the same _id documents already in mongo before inserting

Or use update_many https://api.mongodb.com/python/current/api/pymongo/collection.html?highlight=update#pymongo.collection.Collection.update_many

https://docs.mongodb.com/manual/reference/method/db.collection.updateMany/

answered Oct 17 '22 13:10

Kun

Related questions
                            
                                Configure Sentry for different environments (staging, production)
                            
                                How can I solve y = (x+1)**3 -2 for x in sympy?
                            
                                Optional Synchronous Interface to Asynchronous Functions
                            
                                QScintilla based text editor in PyQt5 with clickable functions and variables
                            
                                How to find the nth derivative given the first derivative with SymPy?
                            
                                Correlating a SQLAlchemy relationship with an awkward join
                            
                                Post import hooks in Python 3
                            
                                Calculate maximum likelihood using PyMC3
                            
                                Removing diagonal elements from a sparse matrix in scipy
                            
                                Sklearn Kmeans parameter confusion?
                            
                                Scikit-Learn SVR Prediction Always Gives the Same Value
                            
                                ARRAY_CONTAINS muliple values in pyspark
                            
                                Using placeholder as shape in tensorflow
                            
                                Configuring Salt API - Java
                            
                                Slack API - Attatchments from custom bot post as plain text
                            
                                rm() function of r alternative in python
                            
                                Was the year 1000 (and others) a leap year?
                            
                                Test Environment with Mocked REST API
                            
                                Selenium Add Cookies From CookieJar
                            
                                ValueError: Must pass DataFrame with boolean values only

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Do bulk inserts/update in MongoDB with PyMongo

Tags:

python

sql

pandas

mongodb

pymongo

Merlin

People also ask

2 Answers

korrigan

Kun

Recent Activity

Donate For Us