Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Duplicate Key Error on Upsert with multi processes(Mongo>=3.0.4 WiredTiger)

Tags:

mongodb

all

I just got a weird error sent through from our applcation:

when i updated with two processes, it was complaining of a duplicate key error on a collection with a unique index on it, but the operation in question was an upsert.

case code:

import time
from bson import Binary
from pymongo import MongoClient, DESCENDING

bucket = MongoClient('127.0.0.1', 27017)['test']['foo']
bucket.drop()
bucket.update({'timestamp': 0}, {'$addToSet': {'_exists_caps': 'cap15'}}, upsert=True, safe=True, w=1, wtimeout=10)
bucket.create_index([('timestamp', DESCENDING)], unique=True)
while True:
    timestamp =  str(int(1000000 * time.time()))
    bucket.update({'timestamp': timestamp}, {'$addToSet': {'_exists_foos': 'fooxxxxx'}}, upsert=True, safe=True, w=1, wtimeout=10)

When i run script with two processes, Pymongo Exception:

Traceback (most recent call last):
  File "test_mongo_update.py", line 11, in <module>
    bucket.update({'timestamp': timestamp}, {'$addToSet': {'_exists_foos': 'fooxxxxx'}}, upsert=True, safe=True, w=1, wtimeout=10)
  File "build/bdist.linux-x86_64/egg/pymongo/collection.py", line 552, in update
  File "build/bdist.linux-x86_64/egg/pymongo/helpers.py", line 202, in _check_write_command_response
pymongo.errors.DuplicateKeyError: E11000 duplicate key error collection: test.foo index: timestamp_-1 dup key: { : "1439374020348044" }

Env:

  • mongodb 3.0.5, WiredTiger

  • single mongodb instance

  • pymongo 2.8.1

mongo.conf

systemLog:
   destination: file
   logAppend: true
   logRotate: reopen
   path: /opt/lib/log/mongod.log

# Where and how to store data.
storage:
   dbPath: /opt/lib/mongo
   journal:
     enabled: true

   engine: "wiredTiger"
   directoryPerDB: true

# how the process runs
processManagement:
   fork: true  # fork and run in background
   pidFilePath: /opt/lib/mongo/mongod.pid

# network interfaces
net:
   port: 27017
   bindIp: 0.0.0.0  # Listen to local interface only, comment to listen on all interfaces.

setParameter:
   enableLocalhostAuthBypass: false

Any thoughts on what could be going wrong here?

PS:

I retried the same case in MMAPV1 storage engine, it works fine, why?

I found something related here: https://jira.mongodb.org/browse/SERVER-18213

but after this bug fix, it cases this error, so it looks like this bug is not fixed completely.

Cheers

like image 218
brain.zhang Avatar asked Aug 12 '15 10:08

brain.zhang


People also ask

How do I avoid duplicate errors in MongoDB?

If you ever faced this error all you need to do is to check your model carefully and find out that is there any unique key set true by you and if it is not necessary then simply remove the unique key from the model or otherwise set a unique value if it is necessary to be unique.

What is duplicate key error in MongoDB?

Because of the unique constraint, MongoDB will only permit one document that lacks the indexed field. If there is more than one document without a value for the indexed field or is missing the indexed field, the index build will fail with a duplicate key error.

Does MongoDB support Upsert?

Here in MongoDB, the upsert option is a Boolean value. Suppose the value is true and the documents match the specified query filter. In that case, the applied update operation will update the documents. If the value is true and no documents match the condition, this option inserts a new document into the collection.

How do you update Upsert?

Or in other words, upsert is a combination of update and insert (update + insert = upsert). If the value of this option is set to true and the document or documents found that match the specified query, then the update operation will update the matched document or documents.


2 Answers

I found the bug at: https://jira.mongodb.org/browse/SERVER-14322

Please feel free to vote for it and watch it for further updates.

like image 160
brain.zhang Avatar answered Oct 17 '22 20:10

brain.zhang


An upsert does both a check for an existing document to update, or inserts a new document.

My best guess is you are running into a timing issue where:

  1. Process 2 checks for existence, which it doesn't
  2. Process 1 checks for existence, which it doesn't
  3. Process 2 inserts, which works
  4. Process 1 inserts, which raises dupe key.

Check what native query your python library is sending underneath first. Confirm it's what you expect on the native mongo side. Then if you can reproduce this semi regularly on wiredtiger but never on mmap, raise a bug with mongo to confirm what their expected behavior is. It's sometimes hard to pick what they guarantee to be atomic.

This is a good example of why Mongo ObjectIDs combine a timestamp, a machine id, a pid and a counter for uniqueness.

like image 24
Matt Avatar answered Oct 17 '22 20:10

Matt