Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Duplicate documents on _id (in mongo)

I have a sharded mongo collection, with over 1.5 mil documents. I use the _id column as a shard key, and the values in this column are integers (rather than ObjectIds).

I do a lot of write operations on this collection, using the Perl driver (insert, update, remove, save) and mongoimport.

My problem is that somehow, I have duplicate documents on the same _id. From what I've read, this shouldn't be possible.

I've removed the duplicates, but others still appear.

Do you have any ideas where could they come from, or what should I start looking at? (Also, I've tried to replicate this on a smaller, test collection, but no duplicates are inserted, no matter what write operation I perform).

like image 520
klaoo z Avatar asked Jun 28 '12 09:06

klaoo z


People also ask

How do I clone a document in MongoDB?

To clone a document, hover over the desired document and click the Clone button. When you click the Clone button, Compass opens the document insertion dialog with the same schema and values as the cloned document. You can edit any of these fields and values before you insert the new document.

Can MongoDB Id be duplicate?

If documents have the same _id values on different shards, attempted migration of those documents to the same shard will result in a duplicate key exception.

How do you prevent duplicates in MongoDB?

To insert records in MongoDB and avoid duplicates, use “unique:true”.


1 Answers

This actually isn't a problem with the Perl driver .. it is related to the characteristics of sharding. MongoDB is only able to enforce uniqueness among the documents located on a single shard at the time of creation, so the default index does not require uniqueness.

In the MongoDB: Configuring Sharding documentation there is specific mention that:

  • When you shard a collection, you must specify the shard key. If there is data in the collection, mongo will require an index to be created upfront (it speeds up the chunking process); otherwise, an index will be automatically created for you.

  • You can use the {unique: true} option to ensure that the underlying index enforces uniqueness so long as the unique index is a prefix of the shard key.

  • If the "unique: true" option is not used, the shard key does not have to be unique.

like image 103
Stennie Avatar answered Oct 28 '22 20:10

Stennie