Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Compare a mongo diff on two collections

I have two mongo collections, one which refers to the production env and the other to the testing env.

How can I compare a diff between my two collections?

I tried dumping them to a bson and then converting to a json. But I cant just perform a simple diff on them since the sorting might vary and the json file is way too large to be sorted.

like image 238
Karan Jain Avatar asked Dec 19 '16 12:12

Karan Jain


People also ask

How does MongoDB relate two collections?

For performing MongoDB Join two collections, you must use the $lookup operator. It is defined as a stage that executes a left outer join with another collection and aids in filtering data from joined documents. For example, if a user requires all grades from all students, then the below query can be written: Students.

Should MongoDB collections be plural?

Collections: plural in lower case: images, resumes, Document fields: lowerCamelCase, e.g. memberFirstName, fileName, etc.


2 Answers

Try the following in the shell, it will iterate each item within a collection and try to match each document based on ID.

Say we have 2 collections db.col1 and db.col2:

> db.col1.find() { "_id" : 1, "item" : 1 } { "_id" : 2, "item" : 2 } { "_id" : 3, "item" : 3 } { "_id" : 4, "item" : 4 }  > db.col2.find() { "_id" : 1, "item" : 1 } { "_id" : 2, "item" : 2 } { "_id" : 3, "item" : 3 } { "_id" : 4, "item" : 4 } 

We can then create a javascript function to compare 2 collections

function compareCollection(col1, col2){     if(col1.count() !== col2.count()){         return false;     }      var same = true;      var compared = col1.find().forEach(function(doc1){         var doc2 = col2.findOne({_id: doc1._id});          same = same && JSON.stringify(doc1)==JSON.stringify(doc2);     });      return same; } 

Then call is like the following:

> compareCollection(db.col1, db.col2) true 

If we then have a 3rd collections db.col3

> db.col3.find() { "_id" : 1, "item" : 1 } 

And compare this one

> compareCollection(db.col1, db.col3) false 

we'll get the expected result.

If we also have a 4th collection which has matching documents but diffrent data db.col4

> db.col4.find() { "_id" : 1, "item" : 10 } { "_id" : 2, "item" : 2 } { "_id" : 3, "item" : 3 } { "_id" : 4, "item" : 4 } 

This will also return false

> compareCollection(db.col1, db.col4) false 
like image 87
Kevin Smith Avatar answered Oct 04 '22 05:10

Kevin Smith


Starting Mongo 4.4, the aggregation framework provides a new $unionWith stage, performing the union of two collections (the combined pipeline results from two collections into a single result set).

Making it way easier to find the diff between two collections:

// > db.test.find() //    { "a" : 9, "b" : 2  } //    { "a" : 4, "b" : 12 } //    { "a" : 3, "b" : 5  } //    { "a" : 0, "b" : 7  } //    { "a" : 7, "b" : 12 } // > db.prod.find() //    { "a" : 3, "b" : 5  } //    { "a" : 4, "b" : 12 } //    { "a" : 3, "b" : 5  } //    { "a" : 0, "b" : 7  } db.test.aggregate(   { $unset: "_id" },   { $project: { from: "test", doc: "$$ROOT" } },   { $unionWith: {       coll: "prod",       pipeline: [         { $unset: "_id" },         { $project: { from: "prod", doc: "$$ROOT" } }       ]   }},   { $group: {       _id: "$doc",       test: { $sum: { $cond: [ { $eq: ["$from", "test"] }, 1, 0 ] } },       prod: { $sum: { $cond: [ { $eq: ["$from", "prod"] }, 1, 0 ] } }   }},   { $match: { $expr: { $ne: ["$test", "$prod"] } } } ) // { "_id" : { "a" : 7, "b" : 12 }, "test" : 1, "prod" : 0 } // { "_id" : { "a" : 9, "b" : 2  }, "test" : 1, "prod" : 0 } // { "_id" : { "a" : 3, "b" : 5  }, "test" : 1, "prod" : 2 } 

This:

  • $unsets the _id in order to latter be able to $group documents by themselves without considering the _id (as it might be different in the other collection).
  • $projects the field from whose value is the collection the document comes from (test or prod), in order, latter when we merge the two collections, to keep track from where documents originated from.
  • Also $projects the field doc whose value is the document itself (thanks to the $$ROOT variable). This is the field that will be used to $group documents together.
  • $unionWith the prod collection in order to merge documents from both collections into the same aggregation pipeline. The pipeline parameter is an optional aggregation pipeline applied on documents from the collection being merged (prod) before documents are inserted into the downstream pipeline. And we're applying the same $unset/$project stages we've applied on test documents.
  • $groups test and prod documents based on the doc field that we've created to represent the actual document. And we accumulate the two fields test and prod as the $sum (count) of grouped documents originating from one or the other collection (via $cond if expressions).
  • $matches resulting grouped elements by only keeping items not having the same number of test and prod documents: the actual diff between the two collections.
like image 36
Xavier Guihot Avatar answered Oct 04 '22 06:10

Xavier Guihot