Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MongoDB sorting by secondary lookup table

Tags:

mongodb

I have a dating app, and I store all the potential Match objects in MongoDB (a Match object happens when user swipes left or right):

{
   uid1: <userid1>,
   uid2: <userid2>,

   uid1action: <L|R|E> (left/right/empty, based what the user1 has done),
   uid2action: <L|R|E> (left/right/empty, based what the user2 has done),
}

Now comes to my question. When I show profiles of potential users to user1, I take in to account all the people who already have liked user1 (because I prioritise these profiles):

var likedQuery = Parse.Query.or(new Parse.Query("Match")
    .equalTo("uid1", userId)
    .equalTo("u2action", "L")
    .equalTo("u1action", "E") // user1 has not done anything
    .select("uid2")
    .limit(paginationLimit);

Now this is nice, everything works nicely. I am now looking to also order the likedQuery by the amount of likes each user has (popularity).

Say these are the following users who have liked user1:

Paul (paul himself has had 50 people like him)

Logan (logan was liked by 20 people)

Michael (michael was liked by 80 people),

We want to order all these people such that Michael would be the first profile user1 sees.

Now my question is, how will I do it using mongoDB? In SQL this would be quite trivial, just do a table JOIN, order by that table using SUM() and COUNT(), and ensure you have necessary indexes.

In mongoDB, the only way I see how to do it is to have a uid2likes (which will be sorted on) field on each Match object that will be incremented by cron job, but that is ridiculous and doesn't scale.

My question is more about how to do this in a way that scales.

like image 561
Erti-Chris Eelmaa Avatar asked Oct 29 '18 20:10

Erti-Chris Eelmaa


Video Answer


2 Answers

You can use below aggregation query in 3.4.

The idea here is to $match all the users who liked user1 followed by self $lookup to get all the users who liked the users who liked user1.

$group and $sort to sort the matches by count desc.

$limit to limit the matched users.

db.colname.aggregate([
  {"$match":{"uid1":userID,"uid2action":"L","uid1action":"E"}},
  {"$lookup":{
    "from":colname,
    "localField":"uid2",
    "foreignField":"uid1",
    "as":"uid2likes"
  }},
  {"$unwind":"$uid2likes"},
  {"$match":{"uid2likes.uid2action":"L"}},
  {"$group":{
    "_id":{"uid1":"$uid1","uid2":"$uid2"},
    "uid2likecount":{"$sum":1}
  }},
  {"$sort":{"uid2likecount":-1}},
  {"$limit":paginationLimit}
])

Couple of notes

It is important to use $lookup + $unwind + $match which is optimized in 3.4 to run by moving query predicate $match inside $lookup. More here

You may make use of existing index ( assuming you have one on uid1) for both initial match and lookup match.

Also try adding index on uid2action and see if it is picked up by $lookup + $match stage. More here and here

Add indexes:

db.colname.createIndex( { uid1: 1 } )
db.colname.createIndex( { uid2action: 1 } )

Measure Index Use:

db.colname.aggregate([{$indexStats: {}}, {$project: {key: 0, host: 0}}]).pretty();

Explain Query:

db.colname.explain("executionStats").aggregate(above pipeline);

You can alternate between indexes and check the execution stats to see how the indexes are being picked up. Also try compound indexes too.

Using 3.6 you can clean up the query a little bit.

db.colname.aggregate([
  {"$match":{"uid1":userID,"uid2action":"L","uid1action":"E"}},
  {"$lookup":{
    "from":colname,
    "let":{"uid2":"$uid2"},
    "pipeline":[
       {"$match":{"$expr":{"$eq":["$uid1","$$uid2"]},"uid2action":"L"}}, 
       {"$count":"count"}
    ],
    "as":"uid2likes"
  }},
  {"$unwind":"$uid2likes"},
  {"$sort":{"uid2likes.count":-1}},
  {"$limit":paginationLimit}
])
like image 183
s7vr Avatar answered Sep 24 '22 00:09

s7vr


This answer depends on the MongoDB Version you are using.

I would use the aggregation pipeline.

You didn't provide a lot of info about what are your schema so these are my assumptions:

  1. your Match object collection's name is match
  2. your user collection's name is users
  3. each document in the users collection has a likes field which holds a Number

The query below will return a sorted list of users who voted L for userID (i.e. the current user), ascending by their total likes.

db.match.aggregate([
  { $match: { "uid1": userID, "uid1action": "E", "uid2action": "L" } },
  { $project: { _id: 0, uid2: 1 } },
  { $lookup: {
      from: "users",
      let: { uid: "$uid2" },
      pipeline: [
          { $match: { $expr: { $eq: [ "$_id", "$$uid" ] } } },
          { $project: { _id: 0, likes: 1 } },
      ],
      as: "likes" }
  },
  { $unwind: "$likes" },
  { $project: { _id: "$uid2", likes: "$likes.likes" } },
  { $sort: { likes: -1 } },
  { $limit: paginationLimit }
])

Aggregation Pipeline Explanation:

From the match collection get all document that match the criteria ( uid1 is the current user, uid1action is "E" and uid2action is "L" ).

Use only the uid2 field from every Match object.

Perform an inner search in the users collection. Match all users whos _id equals the uid2. Take only the likes field from every document there. Return the results as a likes field.

Now, the last action returns a list so unwind that list.

From that take only the uid2 field as _id and likes.likes field as likes.

Sort the results according to the likes field.

Limit the results according to paginationLimit.

like image 31
bergerg Avatar answered Sep 23 '22 00:09

bergerg