Here's an example of my docs:
[{name:"duplicate", value:true, id:2910921},{name:"duplicate", value:true, id:32838293},{name:"duplicate", value:false, id:3283232},{name:"notDuplicate", value:true, id:382932}]
I want to remove if there are multiple documents that contain the same name and the same value. In the example above it would remove one document, either {name:"duplicate", value:true, id:2910921} or {name:"duplicate", value:true, id:32838293}, it does't matter to me which one.
So far, I've considered just creating a new field for each of these which would be something like newField: "duplicatetrue" and then I could just use distinct on these to remove dupes, but I am having trouble figuring out how to concat two different fields with different types into a new field. I'm definitely open to better suggestions as well. Here's what I have so far:
db.collection(collectionName).updateMany({}, {$set: {"newField": ["$name","$value"] }})
However, the above line doesn't output the values, rather it outputs exactly newField: ["$name","$value"]
Removing the quotes from $name and $value does not work either.
I'm using the Node mongodb driver: 3.5.8
You can do it in two ways
$out is destructive and with millions of documents in collection can be an issue in production environment, then you can first read all the _id's of documents to be deleted & use .deleteMany() to delete all docs at once. (You can use any unique identifier on a doc instead of _id but I've used _id as it's indexed by default - which can help to run deleteMany() quicker).Step 1:
Using $out - So as I've said it is destructive cause it will override the entire collection if input name matches or will create a new collection by the result of your aggregation query. So test your aggregation query very well prior to using $out as last stage. Also write data to temporary collection & rename the collections after everything is good enough. Consider a down-time while renaming collections
Query :
db.collection.aggregate([
{
$group: { _id: { name: "$name", value: "$value" },
doc: { $last: "$$ROOT" } // Retrieve only last doc in a group
}
},
{
$replaceRoot: { newRoot: "$doc" } // replace doc as object as new root of document
},
{ $out : 'collection_new' } // Test above aggregation & then use this
])
Test : mongoplayground
Step 2:
_ids to be deleted from collection.Query :
db.collection.aggregate([
/**
* Group on matching docs :
* { name: "duplicate", value: false},
* { name: "duplicate", value: true},
* { name: "duplicate-yes", value: true},
* { name: "notDuplicate", value: true}
* */
{
$group: {
_id: { name: "$name", value: "$value" },
_idsNeedsToBeDeleted: { $push: "$$ROOT._id" } // push all `_id`'s to an array
}
},
/** Remove first element - which is removing a doc */
{
$project: {
_id: 0,
_idsNeedsToBeDeleted: { $slice: [ "$_idsNeedsToBeDeleted", 1, { $size: "$_idsNeedsToBeDeleted" } ] }
}
},
{
$unwind: "$_idsNeedsToBeDeleted" // Unwind `_idsNeedsToBeDeleted`
},
/** Group without a condition & push all `_idsNeedsToBeDeleted` fields to an array */
{
$group: { _id: "", _idsNeedsToBeDeleted: { $push: "$_idsNeedsToBeDeleted" } }
},
{$project : { _id : 0 }} // Optional stage
/** At the end you'll have an [{ _idsNeedsToBeDeleted: [_ids] }] or [] */
])
Test : mongoplayground
.deleteMany() - delete all docs :Query :
db.collection.deleteMany( { "_id" : {$in : [_ids]} } );
Consideration prior to .deleteMany() you need to check aggregation result is not an empty array [] & has a doc with _idsNeedsToBeDeleted field which is an array. Also since we're matching against _id in DB - aggregations _idsNeedsToBeDeleted array will be an array of strings - So iterate over array, convert string to ObjectId() & use that array of ObjectId()'s in delete query.
Note :
Irrespective of what step you choose - Since we're grouping on name + value you need to make sure all of your docs has those fields.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With