I'm using MongoDB with Ruby using mongo
gem.
I have the following scenario:
coll1
, look at key1
and key2
coll2
with matching values for key1
and key2
key3
whose value be set to value of key3
in the document referenced in #1coll3
The general guideline with MongoDB has been to handle cross collection operations in application code.
So I do the following:
client = Mongo::Client.new([ '127.0.0.1:27017' ], :database => some_db,
:server_selection_timeout => 5)
cursor = client[:coll1].find({}, { :projection => {:_id => 0} }) # exclude _id
cursor.each do |doc|
doc_coll2 = client[:coll2].find('$and' => [{:key1 => doc[:key1]}, {:key2 => doc[:key2] }]).limit(1).first # no find_one method
if(doc_coll2 && doc[:key3])
doc_coll2[:key3] = doc[:key3]
doc_coll2.delete(:_id) # remove key :_id
client[:coll3].insert_one(doc_coll2)
end
end
This works, but it takes a lot of time to finish this job - approximately 250ms per document in collection coll1
or 3600s (1 hour) for ~15000 records, which seems a lot, which could be associated with reading the document one at a time, do the check in app code and then writing one doc at a time back to a new collection.
Is there a way to get this operation be done faster? Is the way I'm doing even the right way to do it?
Example documents
coll1
{
"_id" : ObjectId("588610ead0ae360cb815e55f"),
"key1" : "115384042",
"key2" : "276209",
"key3" : "10101122317876"
}
coll2
{
"_id" : ObjectId("788610ead0ae360def15e88e"),
"key1" : "115384042",
"key2" : "276209",
"key4" : 10,
"key5" : 4,
"key6" : 0,
"key7" : "false",
"key8" : 0,
"key9" : "false"
}
coll3
{
"_id" : ObjectId("788610ead0ae360def15e88e"),
"key1" : "115384042",
"key2" : "276209",
"key3" : "10101122317876",
"key4" : 10,
"key5" : 4,
"key6" : 0,
"key7" : "false",
"key8" : 0,
"key9" : "false"
}
A solution would be to use aggregation instead, and do this in one single query:
$lookup
$unwind
coll1.key2 == coll2.key2
with $redact
$project
$out
so the query would be :
db.coll1.aggregate([
{ "$lookup": {
"from": "coll2",
"localField": "key1",
"foreignField": "key1",
"as": "coll2_doc"
}},
{ "$unwind": "$coll2_doc" },
{ "$redact": {
"$cond": [
{ "$eq": [ "$key2", "$coll2_doc.key2" ] },
"$$KEEP",
"$$PRUNE"
]
}},
{
$project: {
key1: 1,
key2: 1,
key3: 1,
key4: "$coll2_doc.key4",
key5: "$coll2_doc.key5",
key6: "$coll2_doc.key6",
key7: "$coll2_doc.key7",
key8: "$coll2_doc.key8",
key9: "$coll2_doc.key9",
}
},
{$out: "coll3"}
], {allowDiskUse: true} );
and db.coll3.find()
would return
{
"_id" : ObjectId("588610ead0ae360cb815e55f"),
"key1" : "115384042",
"key2" : "276209",
"key3" : "10101122317876",
"key4" : 10,
"key5" : 4,
"key6" : 0,
"key7" : "false",
"key8" : 0,
"key9" : "false"
}
If you don't want to specify all keys in the $project
stage, you can take advantage of $addFields
and $replaceRoot
, two new operators introduced in MongoDB 3.4
the query would become:
db.coll1.aggregate([
{ "$lookup": {
"from": "coll2",
"localField": "key1",
"foreignField": "key1",
"as": "coll2_doc"
}},
{ "$unwind": "$coll2_doc" },
{ "$redact": {
"$cond": [
{ "$eq": [ "$key2", "$coll2_doc.key2" ] },
"$$KEEP",
"$$PRUNE"
]
}},
{$addFields: {"coll2_doc.key3": "$key3" }},
{$replaceRoot: {newRoot: "$coll2_doc"}},
{$out: "coll3"}
], {allowDiskUse: true})
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With