Incredibly slow query performance with $lookup and "sub" aggregation pipeline

Tags:

Let's say I have two collections, tasks and customers.

Customers have a 1:n relation with tasks via a "customerId" field in customers.

I now have a view where I need to display tasks with customer names. AND I also need to be able to filter and sort for customer names. Which means I can't do the $limit or $match stage before $lookup in the following query.

So here is my example query:

db.task.aggregate([
    {
        "$match": {
            "_deleted": false
        }
    },
    "$lookup": {
        "from": "customer",
        "let": {
            "foreignId": "$customerId"
        },
        "pipeline": [
            {
                "$match": {
                    "$expr": {
                        "$and": [
                            {
                                "$eq": [
                                    "$_id",
                                    "$$foreignId"
                                ]
                            },
                            {
                              "$eq": [
                                "$_deleted",
                                false
                              ]
                            }
                        ]
                    }
                }
            }
        ],
        "as": "customer"
    },
    {
        "$unwind": {
            "path": "$customer",
            "preserveNullAndEmptyArrays": true
            }
    },
    {
        "$match": {
            "customer.name": 'some_search_string'
        }
    },
    {
        "$sort": {
            "customer.name": -1
        }
    },
    {
        "$limit": 35
    },
    {
        "$project": {
            "_id": 1,
            "customer._id": 1,
            "customer.name": 1,
            "description": 1,
            "end": 1,
            "start": 1,
            "title": 1
        }
    }
])

This query is getting incredibly slow when the collections are growing in size. With 1000 tasks and 20 customers it already takes about 500ms to deliver result.

I'm aware, that this happens because the $lookup operator has to do a tablescan for each row that enters the aggregation pipeline's lookup stage.

I have tried to set indexes like described here: Poor lookup aggregation performance but that doesn't seem to have any impact.

My next guess was that the "sub"-pipeline in the $lookup stage is not capable of using indexes, so I replaced it with a simple

"$lookup": {
    "from": "customer",
    "localField": "customerId",
    "foreignField": "_id",
    "as": "customer"
}

But still the indexes are not used or don't have any impact on performance. (To be honest I don't know which of both is the case since .explain() won't work with aggregation pipelines.)

I have tried the following indexes:

Ascending, desecending, hashed and text index on customerId
Ascending, desecending, hashed and text index on customer.name

I'm grateful for any ideas on what I'm doing wrong or how I could achive the same thing with a better aggregation pipeline.

Additional info: I'm using a three member replica set. I'm on MongoDB 4.0.

Please note: I'm aware that I'm using a non-relational database to achieve highly relational objectives, but in this project MongoDB was our choice due to it's ChangeStream feature. If anybody knows a different database with a comparable feature (realtime push notifications on changes), which can be run on-premise (so Firebase drops out), I would love to hear about it!

Thanks in advance!

849

asked Jan 31 '19 09:01

Stefan Holzapfel

1 Answers

I found out why my indexes weren't used.

I queried the collection using a different collation than the collection's own collation. But the id indexes on a collection are always implemented using the collections default collation.

Therefore the indexes were not used.

I changed the collection's collation to the same as for the queries and now the query takes just a fraction of the time (but still slow :)).

(Yes you have to recreate the collections to change the collation, no on-the-fly change is possible.)

157

answered Sep 28 '22 00:09

Stefan Holzapfel

Related questions
                            
                                MongoDB full word search with exact phrase not returning expected results
                            
                                node.js - MongoError: Can't canonicalize query: BadValue bad order array [2]
                            
                                MongoDB SECONDARY becoming RECOVERING at nighttime
                            
                                bulk.find.update() vs update.collection(multi=true) in mongo
                            
                                Pandas DatetimeIndex from MongoDB ISODate
                            
                                Nested filters: $filter array, then $filter child array
                            
                                Recovering a missing WiredTiger.wt
                            
                                How to overwrite image in mongoDB gridfs?
                            
                                Symfony2 MoneyType with divisor: integer conversion leads to wrong database values
                            
                                Mongo "Unrecognized pipeline stage name: '$filter'"
                            
                                java.lang.NoClassDefFoundError when using MongoDB driver
                            
                                MongoDB: Find all documents except those where a date range condition matches
                            
                                PostgreSQL JSON field type vs MongoDB
                            
                                insertMany always returns duplicate _id error
                            
                                Update Array where it exists or Insert new Array Item
                            
                                Express Js not sending error object on res.json()
                            
                                How to cleanly shutdown Change Streams with Motor?
                            
                                Why is MongoDB slower to insert Records only when collection does not exist?
                            
                                MongoDB connection fails on multiple app servers
                            
                                MongoDB: not all the results are returned from a query, using $geoNear

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Incredibly slow query performance with $lookup and "sub" aggregation pipeline

Tags:

mongodb

mongodb-query

aggregation-framework

Stefan Holzapfel

People also ask

1 Answers

Stefan Holzapfel

Recent Activity

Donate For Us