Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the most efficient way to query two collections in MongoDB for search results with pagination

Here is the scenario:

  • There are two collections
  • The first collection has a one to many relationship to the second collection. The second collection has a one to one with the first collection.
  • The query is performed on 3 fields all of which are index
  • 2 of those indexes are on the first collection and 1 is on the second collection
  • The results need to support pagination

Currently the best I could come up with is using an aggregation. The stages look something like this:

Aggregation -> match on 2 indexed values on the first collection -> sort -> lookup with a pipeline that has a match on the relationship property in both collections AND match based on the potential search value on an indexed value in the second collection -> match with OR that looks at 2 search fields in the first collection using regex or if the project from the lookup contained any results -> limit -> project with values

The concerns are that the search will do a join on all of the documents in the first collection with the second collection during the lookup. Keep in mind everything being searched is index, but the lookup is the major concern here. Suggestions to do this the right way? Better way?

Code example:

db.collection1.aggregate( [
    { 
     $match: { // initial filters based on indexed values
        field1: "somevalue", 
        field2: "somevalue" 
     },
    },
    {
        $sort: {
            firstSortField: -1, _id: -1 // sort results by needed order
        }
    },
    {
    $lookup: // join with another collection to search on a specific value
        {
        from: collection2,
        localField: someLocalField,
        foreignField: someForeignField,
        as: "someJoinedFields"
        }
    },
    {
        $addFields: {
            extraField: ["$someJoinedFields.someExtaField"] // add potential array of values
        }
    },
    {
    $match: (
        {
        $or: [
            { field3: {$regex: ""}}, // potential search field
            { field4: {$regex: ""}}, // potential search field
            { extraField: {$regex: ""}} // potential search field
            ]
        }
    )
    },
    {
        $limit: 100 // limit to 100 results for pagination
    },
    {
        $project: { // final results
            finalField: 1,
            finalField2: 1,
            finalField3: 1
        }
    }
 ])
like image 226
Stokedbits Avatar asked Dec 18 '21 22:12

Stokedbits


1 Answers

Problem

Sadly your schemas do not efficiently fit your need for the following reasons:

  • Pagination needs an immutable sorting on the collection to keep track of the last element and make sure no element is skipped or repeated
    • You do it with _id, which is good as it's guaranteed to be unique.
    • You don't keep track of the last element (basically using $skip in your example)
  • Lookup should be done after the $limit (as you said :D)
    • by doing so you avoid merging a lot of elements. (it gets slow really fast!)
  • No match should be done after the $limit (as you currently do :D)
    • If you put a match after, you won't keep the number of elements you wanted

Basically, you are asking something which allows you to do $skip before $lookup, $match before $skip, and $lookup before $skip. This is not possible!

Solution(s)

All the solutions that come to mind are actually difficult to implement.

Make your DB embedded

https://docs.mongodb.com/manual/tutorial/model-embedded-one-to-many-relationships-between-documents/

One of the best things in MongoDB is how easy it is to change the structure of documents. If you can do so, while not destroying some other feature, do so. By embedding the document, you don't need the $lookup anymore, removing the problem completely, (you can then even make the limit way larger, as the only "slow" thing will be the collect phase)

Create a materialized view

https://docs.mongodb.com/manual/core/materialized-views/

This will allow you not to change the original structure while having the speed of the embedded query. This method will slow down your writing speed, as you will have to recreate the view on each insert or edit on one of the collections, but read will be faster

Mongo 5 improved $lookup

https://docs.mongodb.com/manual/reference/operator/aggregation/lookup/#correlated-subqueries-using-concise-syntax

With this method, you will merge way fewer documents and you can discard them easily.

Sum up

When to change the structure

The way to go is to change the db structure. I understand tho, that it can be impossible to do for various reasons.

When to use the materialized view

You don't have many write operations and have no space problems (as this will occupy double the space)

When to use $lookup

Use lookup if you cannot change the structure and cannot allow slower writes. This is by far the slowest of the 3 methods, but will still give you a boost. In the long term, this method might not be a solution at all, it all depends on what your use cases are :D

like image 172
Andrea Avatar answered Oct 08 '22 17:10

Andrea