Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MongoDB Aggregation Limit Lookup

I am using $lookup in PyMongo to successfully "join" two collections (this works). I am having a problem where the second collection I am joining in may exceed the BSON document size when it returns all of the records.

I am looking to use $limit to limit the number of records that are allowed to join under "match_docs" eg: 100 records maximum from "comments" per obj_id:

db.indicators.aggregate([
  {
    "$lookup": {
      "from": "comments",
      "localField": "_id",
      "foreignField": "obj_id",
      "as": "match_docs"
    }
  }
])

I've tried various types of $limit, and it seems to only limit the total number of results overall, not just for the join.

like image 643
gleb1783 Avatar asked Jun 07 '17 13:06

gleb1783


People also ask

What are lookup in aggregation for MongoDB?

$lookup performs an equality match on the localField to the foreignField from the documents of the from collection. If an input document does not contain the localField , the $lookup treats the field as having a value of null for matching purposes.

How do I limit the number of records in MongoDB?

To limit the records in MongoDB, you need to use limit() method. The method accepts one number type argument, which is the number of documents that you want to be displayed.

What is the use of the limit () and Skip () method?

The limit() function in MongoDB is used to specify the maximum number of results to be returned. Only one parameter is required for this function.to return the number of the desired result. Sometimes it is required to return a certain number of results after a certain number of documents. The skip() can do this job.

Is MongoDB aggregation fast?

On large collections of millions of documents, MongoDB's aggregation was shown to be much worse than Elasticsearch. Performance worsens with collection size when MongoDB starts using the disk due to limited system RAM. The $lookup stage used without indexes can be very slow.


2 Answers

Starting from MongoDB 3.6 you can use uncorrelated subqueries to limit the lookup:

db.indicators.aggregate([
{ $lookup: {
  from: 'comments',
  as: 'match_docs',
  let: { indicator_id: '$_id' },
  pipeline: [
    { $match: {
      $expr: { $eq: [ '$obj_id', '$$indicator_id' ] }
    } },
    // { $sort: { createdAt: 1 } }, // add sort if needed (for example, if you want first 100 comments by creation date)
    { $limit: 100 }
  ]
} }
])
like image 161
Dan Karbayev Avatar answered Oct 21 '22 05:10

Dan Karbayev


If you do a $unwind immediately following a $lookup, the pipeline will be optimized, basically combining the 2 stages helping to bypass the 16MB limit that could result from the $lookup returning a large number of documents.

Keep in mind, if a single document in the foreign collection plus the size of the document in the local collection exceed 16 MB, this optimization cannot help.

like image 32
Pete Garafano Avatar answered Oct 21 '22 05:10

Pete Garafano