Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Comparing documents between two MongoDB collections

Tags:

join

mongodb

I have two existing collections and need to populate a third collection based on the comparison between the two existing.

The two collections that need to be compared have the following schema:

// Settings collection:
{
  "Identifier":"ABC123",
  "C":"1",
  "U":"V",
  "Low":116,
  "High":124,
  "ImportLogId":1
}

// Data collection
{
  "Identifier":"ABC123",
  "C":"1",
  "U":"V",
  "Date":"11/6/2013 12AM",
  "Value":128,
  "ImportLogId": 1
}

I am new to MongoDB and NoSQL in general so I am having a tough time grasping how to do this. The SQL would look something like this:

SELECT s.Identifier, r.ReadValue, r.U, r.C, r.Date
FROM Settings s
JOIN Reads r
  ON s.Identifier = r.Identifier
  AND s.C = r.C
  AND s.U = r.U
WHERE (r.Value <= s.Low OR r.Value >= s.High)

In this case using the sample data, I would want to return a record because the value from the Data collection is greater than the high value from the setting collection. Is this possible using Mongo queries or map reduce, or is this bad collection structure (i.e. maybe all of this should be in one collection)?

A few more additional notes: The Settings collection should really only have 1 record per "Identifier". The Data collection will have many records per "Identifier". This process could potentially be scanning hundreds of thousands of documents at one time, so resource consideration is somewhat important

like image 492
TechDawg270 Avatar asked Nov 06 '13 16:11

TechDawg270


2 Answers

There is no good way of performing operation like this using MongoDB. If you want BAD way you can use code like this:

db.settings.find().forEach(
    function(doc) {
        data = db.data.find({
            Identifier: doc.Idendtifier,
            C: doc.C,
            U: doc.U,
            $or: [{Value: {$lte: doc.Low}}, {Value: {$gte: doc.High}}]
        }).toArray();
        // Do what you need
    }
) 

but don't expect it will perform even remotely as good as any decent RDBMS.

You could rebuild your schema and embed documents from data collection like this:

{
    "_id" : ObjectId("527a7f4b07c17a1f8ad009d2"),
    "Identifier" : "ABC123",
    "C" : "1",
    "U" : "V",
    "Low" : 116,
    "High" : 124,
    "ImportLogId" : 1,
    "Data" : [
        {
            "Date" : ISODate("2013-11-06T00:00:00Z"),
            "Value" : 128
        },
        {
            "Date" : ISODate("2013-10-09T00:00:00Z"),
            "Value" : 99
        }
    ]
}

It may work if number of embedded document is low but to be honest working with arrays of documents is far from being pleasant experience. Not even mention that you can easily hit document size limit with growing size of the Data array.

If this kind of operations is typical for your application I would consider using different solution. As much as I like MongoDB it works well only with certain type of data and access patterns.

like image 158
zero323 Avatar answered Oct 06 '22 02:10

zero323


Without the concept of JOIN, you must change your approach and denormalize.

In your case, looks like you're doing a data log validation. My advice is looping settings collection and with each of them use the findAndModify operator in order to set a validation flag on data collection records who matches; after that, you could just use the find operator on the data collection, filtering by the new flag.

like image 23
dbra Avatar answered Oct 06 '22 00:10

dbra