I am using $lookup in PyMongo to successfully "join" two collections (this works). I am having a problem where the second collection I am joining in may exceed the BSON document size when it returns all of the records.
I am looking to use $limit to limit the number of records that are allowed to join under "match_docs" eg: 100 records maximum from "comments" per obj_id:
db.indicators.aggregate([
{
"$lookup": {
"from": "comments",
"localField": "_id",
"foreignField": "obj_id",
"as": "match_docs"
}
}
])
I've tried various types of $limit, and it seems to only limit the total number of results overall, not just for the join.
$lookup performs an equality match on the localField to the foreignField from the documents of the from collection. If an input document does not contain the localField , the $lookup treats the field as having a value of null for matching purposes.
To limit the records in MongoDB, you need to use limit() method. The method accepts one number type argument, which is the number of documents that you want to be displayed.
The limit() function in MongoDB is used to specify the maximum number of results to be returned. Only one parameter is required for this function.to return the number of the desired result. Sometimes it is required to return a certain number of results after a certain number of documents. The skip() can do this job.
On large collections of millions of documents, MongoDB's aggregation was shown to be much worse than Elasticsearch. Performance worsens with collection size when MongoDB starts using the disk due to limited system RAM. The $lookup stage used without indexes can be very slow.
Starting from MongoDB 3.6 you can use uncorrelated subqueries to limit the lookup:
db.indicators.aggregate([
{ $lookup: {
from: 'comments',
as: 'match_docs',
let: { indicator_id: '$_id' },
pipeline: [
{ $match: {
$expr: { $eq: [ '$obj_id', '$$indicator_id' ] }
} },
// { $sort: { createdAt: 1 } }, // add sort if needed (for example, if you want first 100 comments by creation date)
{ $limit: 100 }
]
} }
])
If you do a $unwind
immediately following a $lookup
, the pipeline will be optimized, basically combining the 2 stages helping to bypass the 16MB limit that could result from the $lookup
returning a large number of documents.
Keep in mind, if a single document in the foreign collection plus the size of the document in the local collection exceed 16 MB, this optimization cannot help.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With