I have two collections
Posts:
{ "_Id": "1", "_PostTypeId": "1", "_AcceptedAnswerId": "192", "_CreationDate": "2012-02-08T20:02:48.790", "_Score": "10", ... "_OwnerUserId": "6", ... }, ...
and users:
{ "_Id": "1", "_Reputation": "101", "_CreationDate": "2012-02-08T19:45:13.447", "_DisplayName": "Geoff Dalgas", ... "_AccountId": "2" }, ...
and I want to find users who write between 5 and 15 posts. This is how my query looks like:
db.posts.aggregate([ { $lookup: { from: "users", localField: "_OwnerUserId", foreignField: "_AccountId", as: "X" } }, { $group: { _id: "$X._AccountId", posts: { $sum: 1 } } }, { $match : {posts: {$gte: 5, $lte: 15}} }, { $sort: {posts: -1 } }, { $project : {posts: 1} } ])
and it works terrible slow. For 6k users and 10k posts it tooks over 40 seconds to get response while in relational database I get response in a split second. Where's the problem? I'm just getting started with mongodb and it's quite possible that I messed up this query.
It is slow because it is not using an index. For each document in the logs collection, it is doing a full collection scan on the graphs collection.
The MongoDB Lookup operator, by definition, “Performs a left outer join to an unshared collection in the same database to filter in documents from the “joined” collection for processing.” Simply put, using the MongoDB Lookup operator makes it possible to merge data from the document you are running a query on and the ...
from https://docs.mongodb.com/manual/reference/operator/aggregation/lookup/
foreignField Specifies the field from the documents in the from collection. $lookup performs an equality match on the foreignField to the localField from the input documents. If a document in the from collection does not contain the foreignField, the $lookup treats the value as null for matching purposes.
This will be performed the same as any other query.
If you don't have an index on the field _AccountId, it will do a full tablescan query for each one of the 10,000 posts. The bulk of the time will be spent in that tablescan.
db.users.ensureIndex("_AccountId", 1)
speeds up the process so it's doing 10,000 index hits instead of 10,000 table scans.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With