Here is the scenario:
Currently the best I could come up with is using an aggregation. The stages look something like this:
Aggregation -> match on 2 indexed values on the first collection -> sort -> lookup with a pipeline that has a match on the relationship property in both collections AND match based on the potential search value on an indexed value in the second collection -> match with OR that looks at 2 search fields in the first collection using regex or if the project from the lookup contained any results -> limit -> project with values
The concerns are that the search will do a join on all of the documents in the first collection with the second collection during the lookup. Keep in mind everything being searched is index, but the lookup is the major concern here. Suggestions to do this the right way? Better way?
Code example:
db.collection1.aggregate( [
{
$match: { // initial filters based on indexed values
field1: "somevalue",
field2: "somevalue"
},
},
{
$sort: {
firstSortField: -1, _id: -1 // sort results by needed order
}
},
{
$lookup: // join with another collection to search on a specific value
{
from: collection2,
localField: someLocalField,
foreignField: someForeignField,
as: "someJoinedFields"
}
},
{
$addFields: {
extraField: ["$someJoinedFields.someExtaField"] // add potential array of values
}
},
{
$match: (
{
$or: [
{ field3: {$regex: ""}}, // potential search field
{ field4: {$regex: ""}}, // potential search field
{ extraField: {$regex: ""}} // potential search field
]
}
)
},
{
$limit: 100 // limit to 100 results for pagination
},
{
$project: { // final results
finalField: 1,
finalField2: 1,
finalField3: 1
}
}
])
Sadly your schemas do not efficiently fit your need for the following reasons:
_id
, which is good as it's guaranteed to be unique.$skip
in your example)$limit
(as you said :D)
$limit
(as you currently do :D)
Basically, you are asking something which allows you to do $skip
before $lookup
, $match
before $skip
, and $lookup
before $skip
. This is not possible!
All the solutions that come to mind are actually difficult to implement.
https://docs.mongodb.com/manual/tutorial/model-embedded-one-to-many-relationships-between-documents/
One of the best things in MongoDB is how easy it is to change the structure of documents. If you can do so, while not destroying some other feature, do so. By embedding the document, you don't need the $lookup
anymore, removing the problem completely, (you can then even make the limit way larger, as the only "slow" thing will be the collect phase)
https://docs.mongodb.com/manual/core/materialized-views/
This will allow you not to change the original structure while having the speed of the embedded query. This method will slow down your writing speed, as you will have to recreate the view on each insert or edit on one of the collections, but read will be faster
$lookup
https://docs.mongodb.com/manual/reference/operator/aggregation/lookup/#correlated-subqueries-using-concise-syntax
With this method, you will merge way fewer documents and you can discard them easily.
The way to go is to change the db structure. I understand tho, that it can be impossible to do for various reasons.
You don't have many write operations and have no space problems (as this will occupy double the space)
Use lookup if you cannot change the structure and cannot allow slower writes. This is by far the slowest of the 3 methods, but will still give you a boost. In the long term, this method might not be a solution at all, it all depends on what your use cases are :D
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With