I have two existing collections and need to populate a third collection based on the comparison between the two existing.
The two collections that need to be compared have the following schema:
// Settings collection:
{
"Identifier":"ABC123",
"C":"1",
"U":"V",
"Low":116,
"High":124,
"ImportLogId":1
}
// Data collection
{
"Identifier":"ABC123",
"C":"1",
"U":"V",
"Date":"11/6/2013 12AM",
"Value":128,
"ImportLogId": 1
}
I am new to MongoDB and NoSQL in general so I am having a tough time grasping how to do this. The SQL would look something like this:
SELECT s.Identifier, r.ReadValue, r.U, r.C, r.Date
FROM Settings s
JOIN Reads r
ON s.Identifier = r.Identifier
AND s.C = r.C
AND s.U = r.U
WHERE (r.Value <= s.Low OR r.Value >= s.High)
In this case using the sample data, I would want to return a record because the value from the Data collection is greater than the high value from the setting collection. Is this possible using Mongo queries or map reduce, or is this bad collection structure (i.e. maybe all of this should be in one collection)?
A few more additional notes: The Settings collection should really only have 1 record per "Identifier". The Data collection will have many records per "Identifier". This process could potentially be scanning hundreds of thousands of documents at one time, so resource consideration is somewhat important
There is no good way of performing operation like this using MongoDB. If you want BAD way you can use code like this:
db.settings.find().forEach(
function(doc) {
data = db.data.find({
Identifier: doc.Idendtifier,
C: doc.C,
U: doc.U,
$or: [{Value: {$lte: doc.Low}}, {Value: {$gte: doc.High}}]
}).toArray();
// Do what you need
}
)
but don't expect it will perform even remotely as good as any decent RDBMS.
You could rebuild your schema and embed documents from data collection like this:
{
"_id" : ObjectId("527a7f4b07c17a1f8ad009d2"),
"Identifier" : "ABC123",
"C" : "1",
"U" : "V",
"Low" : 116,
"High" : 124,
"ImportLogId" : 1,
"Data" : [
{
"Date" : ISODate("2013-11-06T00:00:00Z"),
"Value" : 128
},
{
"Date" : ISODate("2013-10-09T00:00:00Z"),
"Value" : 99
}
]
}
It may work if number of embedded document is low but to be honest working with arrays of documents is far from being pleasant experience. Not even mention that you can easily hit document size limit with growing size of the Data array.
If this kind of operations is typical for your application I would consider using different solution. As much as I like MongoDB it works well only with certain type of data and access patterns.
Without the concept of JOIN, you must change your approach and denormalize.
In your case, looks like you're doing a data log validation. My advice is looping settings collection and with each of them use the findAndModify operator in order to set a validation flag on data collection records who matches; after that, you could just use the find operator on the data collection, filtering by the new flag.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With