I have two mongo collections, one which refers to the production env and the other to the testing env.
How can I compare a diff between my two collections?
I tried dumping them to a bson and then converting to a json. But I cant just perform a simple diff on them since the sorting might vary and the json file is way too large to be sorted.
For performing MongoDB Join two collections, you must use the $lookup operator. It is defined as a stage that executes a left outer join with another collection and aids in filtering data from joined documents. For example, if a user requires all grades from all students, then the below query can be written: Students.
Collections: plural in lower case: images, resumes, Document fields: lowerCamelCase, e.g. memberFirstName, fileName, etc.
Try the following in the shell, it will iterate each item within a collection and try to match each document based on ID.
Say we have 2 collections db.col1
and db.col2
:
> db.col1.find() { "_id" : 1, "item" : 1 } { "_id" : 2, "item" : 2 } { "_id" : 3, "item" : 3 } { "_id" : 4, "item" : 4 } > db.col2.find() { "_id" : 1, "item" : 1 } { "_id" : 2, "item" : 2 } { "_id" : 3, "item" : 3 } { "_id" : 4, "item" : 4 }
We can then create a javascript function to compare 2 collections
function compareCollection(col1, col2){ if(col1.count() !== col2.count()){ return false; } var same = true; var compared = col1.find().forEach(function(doc1){ var doc2 = col2.findOne({_id: doc1._id}); same = same && JSON.stringify(doc1)==JSON.stringify(doc2); }); return same; }
Then call is like the following:
> compareCollection(db.col1, db.col2) true
If we then have a 3rd collections db.col3
> db.col3.find() { "_id" : 1, "item" : 1 }
And compare this one
> compareCollection(db.col1, db.col3) false
we'll get the expected result.
If we also have a 4th collection which has matching documents but diffrent data db.col4
> db.col4.find() { "_id" : 1, "item" : 10 } { "_id" : 2, "item" : 2 } { "_id" : 3, "item" : 3 } { "_id" : 4, "item" : 4 }
This will also return false
> compareCollection(db.col1, db.col4) false
Starting Mongo 4.4
, the aggregation framework provides a new $unionWith
stage, performing the union of two collections (the combined pipeline results from two collections into a single result set).
Making it way easier to find the diff between two collections:
// > db.test.find() // { "a" : 9, "b" : 2 } // { "a" : 4, "b" : 12 } // { "a" : 3, "b" : 5 } // { "a" : 0, "b" : 7 } // { "a" : 7, "b" : 12 } // > db.prod.find() // { "a" : 3, "b" : 5 } // { "a" : 4, "b" : 12 } // { "a" : 3, "b" : 5 } // { "a" : 0, "b" : 7 } db.test.aggregate( { $unset: "_id" }, { $project: { from: "test", doc: "$$ROOT" } }, { $unionWith: { coll: "prod", pipeline: [ { $unset: "_id" }, { $project: { from: "prod", doc: "$$ROOT" } } ] }}, { $group: { _id: "$doc", test: { $sum: { $cond: [ { $eq: ["$from", "test"] }, 1, 0 ] } }, prod: { $sum: { $cond: [ { $eq: ["$from", "prod"] }, 1, 0 ] } } }}, { $match: { $expr: { $ne: ["$test", "$prod"] } } } ) // { "_id" : { "a" : 7, "b" : 12 }, "test" : 1, "prod" : 0 } // { "_id" : { "a" : 9, "b" : 2 }, "test" : 1, "prod" : 0 } // { "_id" : { "a" : 3, "b" : 5 }, "test" : 1, "prod" : 2 }
This:
$unset
s the _id
in order to latter be able to $group
documents by themselves without considering the _id
(as it might be different in the other collection).$project
s the field from
whose value is the collection the document comes from (test
or prod
), in order, latter when we merge the two collections, to keep track from where documents originated from.$project
s the field doc
whose value is the document itself (thanks to the $$ROOT
variable). This is the field that will be used to $group
documents together.$unionWith
the prod
collection in order to merge documents from both collections into the same aggregation pipeline. The pipeline
parameter is an optional aggregation pipeline applied on documents from the collection being merged (prod
) before documents are inserted into the downstream pipeline. And we're applying the same $unset
/$project
stages we've applied on test
documents.$group
s test
and prod
documents based on the doc
field that we've created to represent the actual document. And we accumulate the two fields test
and prod
as the $sum
(count) of grouped documents originating from one or the other collection (via $cond
if expressions).$match
es resulting grouped elements by only keeping items not having the same number of test and prod documents: the actual diff between the two collections.If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With