Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MongoDB Duplicate Documents even after adding unique key

Tags:

I have created a collection and added a unique key like this

db.user_services.createIndex({"uid":1 , "sid": 1},{unique:true,dropDups: true}) 

The collection looks something like this "user_services"

{  "_id" : ObjectId("55068b35f791c7f81000002d"),  "uid" : 15,  "sid" : 1,  "rate" : 5 }, {   "_id" : ObjectId("55068b35f791c7f81000002f"),  "uid" : 15,  "sid" : 1,  "rate" : 4 } 

Problem :

Am using php driver to insert documents with same uid and sid and it is getting inserted.

What I want

  1. On Mongo Shell : Add unique key on uid and sid with no duplicate documents with the same uid and sid.
  2. On PHP Side : having something like mysql "insert (value) on duplicate key update rate=rate+1". That is whenever I try to insert a document, it should be inserted if not there else it should update the rate field of the document
like image 930
Raj Nandan Sharma Avatar asked Mar 16 '15 08:03

Raj Nandan Sharma


People also ask

How do I stop MongoDB from inserting duplicate records?

To insert records in MongoDB and avoid duplicates, use “unique:true”.

How do I avoid duplicate errors in MongoDB?

If you ever faced this error all you need to do is to check your model carefully and find out that is there any unique key set true by you and if it is not necessary then simply remove the unique key from the model or otherwise set a unique value if it is necessary to be unique.

How do I remove duplicates in MongoDB collection?

Duplicate records can be removed from a MongoDB collection by creating a unique index on the collection and specifying the dropDups option. Here is the trace of a session that shows the contents of a collection before and after creating a unique index with dropDups.

Which key prevents duplicates?

You can use a PRIMARY KEY or a UNIQUE Index on a table with the appropriate fields to stop duplicate records.


2 Answers

Congratulations, you appear to have found a bug. This only happens with MongoDB 3.0.0 in my testing, or at least is not present at MongoDB 2.6.6. Bug now recorded at SERVER-17599

NOTE: Not actually an "issue" but confirmed "by design". Dropped the option for version 3.0.0. Still listed in the documentation though.

The problem is that the index is not being created and errors when you attempt to create this on a collection with existing duplicates on the "compound key" fields. On the above, the index creation should yield this in the shell:

{     "createdCollectionAutomatically" : false,     "numIndexesBefore" : 1,     "errmsg" : "exception: E11000 duplicate key error dup key: { : 15.0, : 1.0 }",     "code" : 11000,     "ok" : 0 } 

When there are no duplicates present you can create the index as you are currently trying and it will be created.

So to work around this, first remove the duplicates with a procedure like this:

db.events.aggregate([     { "$group": {         "_id": { "uid": "$uid", "sid": "$sid" },         "dups": { "$push": "$_id" },         "count": { "$sum": 1 }     }},     { "$match": { "count": { "$gt": 1 } }} ]).forEach(function(doc) {     doc.dups.shift();     db.events.remove({ "_id": {"$in": doc.dups }}); });  db.events.createIndex({"uid":1 , "sid": 1},{unique:true}) 

Then further inserts containing duplicate data will not be inserted and the appropriate error will be recorded.

The final note here is that "dropDups" is/was not a very elegant solution for removing duplicate data. You really want something with more control as demonstrated above.

For the second part, rather than use .insert() use the .update() method. It has an "upsert" option

$collection->update(     array( "uid" => 1, "sid" => 1 ),     array( '$set' => $someData ),     array( 'upsert' => true ) ); 

So the "found" documents are "modified" and the documents not found are "inserted". Also see $setOnInsert for a way to only create certain data when the document is actually inserted and not when modified.


For your specific attempt, the correct syntax of .update() is three arguments. "query", "update" and "options":

$collection->update(     array( "uid" => 1, "sid" => 1 ),     array(         '$set' => array( "field" => "this" ),         '$inc' => array( "counter" => 1 ),         '$setOnInsert' => array( "newField" => "another" )    ),    array( "upsert" => true ) ); 

None of the update operations are allowed to "access the same path" as used in another update operation in that "update" document section.

like image 177
Neil Lunn Avatar answered Oct 23 '22 00:10

Neil Lunn


I feel like the current most popular answer is a little too local and detailed for such an elementary MongoDB operation - removing duplicates from mongo by a key.

Removing duplicates by a key for mongo > 3.0 is simple. Just run this query, replacing yourDuplicateKey and assuming _id is your primary key (make sure you mongodump just in case):

db.yourCollection.aggregate([     { "$group": {         "_id": { "yourDuplicateKey": "$yourDuplicateKey" },         "dups": { "$push": "$_id" },         "count": { "$sum": 1 }     }},     { "$match": { "count": { "$gt": 1 } }} ]).forEach(function(doc) {     doc.dups.shift();     db.yourCollection.remove({ "_id": {"$in": doc.dups }}); }); 
like image 34
chakeda Avatar answered Oct 23 '22 00:10

chakeda