Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Poor lookup aggregation performance

Tags:

I have two collections

Posts:

{     "_Id": "1",     "_PostTypeId": "1",     "_AcceptedAnswerId": "192",     "_CreationDate": "2012-02-08T20:02:48.790",     "_Score": "10",     ...     "_OwnerUserId": "6",     ... }, ... 

and users:

{     "_Id": "1",     "_Reputation": "101",     "_CreationDate": "2012-02-08T19:45:13.447",     "_DisplayName": "Geoff Dalgas",     ...     "_AccountId": "2" }, ... 

and I want to find users who write between 5 and 15 posts. This is how my query looks like:

db.posts.aggregate([     {         $lookup: {             from: "users",              localField: "_OwnerUserId",             foreignField: "_AccountId",              as: "X"         }     },       {         $group: {             _id: "$X._AccountId",              posts: { $sum: 1 }         }     },        {         $match : {posts: {$gte: 5, $lte: 15}}     },       {         $sort: {posts: -1 }     },     {         $project : {posts: 1}     } ]) 

and it works terrible slow. For 6k users and 10k posts it tooks over 40 seconds to get response while in relational database I get response in a split second. Where's the problem? I'm just getting started with mongodb and it's quite possible that I messed up this query.

like image 600
user3616181 Avatar asked May 02 '17 16:05

user3616181


People also ask

Is lookup in MongoDB slow?

It is slow because it is not using an index. For each document in the logs collection, it is doing a full collection scan on the graphs collection.

How does MongoDB lookup work?

The MongoDB Lookup operator, by definition, “Performs a left outer join to an unshared collection in the same database to filter in documents from the “joined” collection for processing.” Simply put, using the MongoDB Lookup operator makes it possible to merge data from the document you are running a query on and the ...


1 Answers

from https://docs.mongodb.com/manual/reference/operator/aggregation/lookup/

foreignField Specifies the field from the documents in the from collection. $lookup performs an equality match on the foreignField to the localField from the input documents. If a document in the from collection does not contain the foreignField, the $lookup treats the value as null for matching purposes.

This will be performed the same as any other query.

If you don't have an index on the field _AccountId, it will do a full tablescan query for each one of the 10,000 posts. The bulk of the time will be spent in that tablescan.

db.users.ensureIndex("_AccountId", 1)  

speeds up the process so it's doing 10,000 index hits instead of 10,000 table scans.

like image 112
bauman.space Avatar answered Oct 20 '22 09:10

bauman.space