Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to aggregate to get every combination of two users per movie grouping key?

Here is my collection:

{ "user" : 1, "rate" : 1, "movie" : 1}
{ "user" : 1, "rate" : 3, "movie" : 3}
{ "user" : 1, "rate" : 2, "movie" : 4}
{ "user" : 1, "rate" : 3, "movie" : 5}
{ "user" : 2, "rate" : 4, "movie" : 1}
{ "user" : 2, "rate" : 2, "movie" : 3}
{ "user" : 2, "rate" : 5, "movie" : 6}
{ "user" : 3, "rate" : 1, "movie" : 3}

Here is the result I want get:

{ "user1" : 1, "rate1" : 1,"user2" : 2, "rate2" : 4, "movie" : 1}
{ "user1" : 1, "rate1" : 3,"user2" : 2, "rate2" : 2, "movie" : 3}
{ "user1" : 1, "rate1" : 3,"user2" : 3, "rate2" : 1, "movie" : 3}
{ "user1" : 2, "rate1" : 2,"user2" : 3, "rate2" : 1, "movie" : 3}

For every "movie" that is every combination of the "user" and "rate" values as a pair where there is more than one "user" for that "movie"

like image 591
user6148078 Avatar asked Dec 18 '25 12:12

user6148078


1 Answers

You are looking for the "permutations" when grouping on "movie" for two or more "user" values. It's not the sort of thing you really get from a "database" query.

But if you $group on "movie" and $push the other data, you can then filter out any content with less than two entries in that array and work out your "possible combinations" from there. So you can ask the database to do the grouping and filtering, but the the rest is for algorithms in "set theory".

So this is "part" aggregation statement and "part" processing of the code in result:

db.movies.aggregate([
  { "$group": {
    "_id": "$movie",
    "people": {
      "$push": {
        "user": "$user",
        "rate": "$rate"
      }
    }
  }},
  { "$redact": {
    "$cond": {
      "if": { "$gt": [{ "$size": "$people" }, 1] },
      "then": "$$KEEP",
      "else": "$$PRUNE"
    }
  }},
  { "$sort": { "_id": 1 } }
]).forEach(function(doc) {
  var n = doc.people.length;
  var i,j;

  for (i = 0; i < n; i++) {
    for (j = i + 1; j < n; j++) {
      printjson({ 
        "user1": doc.people[i].user, 
        "rate1": doc.people[i].rate,
        "user2": doc.people[j].user,
        "rate2": doc.people[j].rate,
        "movie": doc._id
      })
    }
  }
})

So the aggregation part itself first does a $group on the "movie" values as mentioned and creates the array with $push. Since not all results are going to have an array with more than one entry, you then remove those with $redact. This is a "logical filter" that uses $size to compare the generated array and see if it has "more than" ( $gt ) one entry.

The results at this stage look like this, after also applying a $sort:

{
  "_id" : 1,
  "people" : [
    {
      "user" : 1,
      "rate" : 1
    },
    {
      "user" : 2,
      "rate" : 4
    }
  ]
}
{
  "_id" : 3,
  "people" : [
    {
      "user" : 1,
      "rate" : 3
    },
    {
      "user" : 2,
      "rate" : 2
    },
    {
      "user" : 3,
      "rate" : 1
    }
  ]
}

The next part is really up to an "algorithm" to generate the possible "pair" combinations. It's a pretty common and well known approach, so you just run the loops on the arrays of each document returned in response to produce the result:

{ "user1" : 1, "rate1" : 1, "user2" : 2, "rate2" : 4, "movie" : 1 }
{ "user1" : 1, "rate1" : 3, "user2" : 2, "rate2" : 2, "movie" : 3 }
{ "user1" : 1, "rate1" : 3, "user2" : 3, "rate2" : 1, "movie" : 3 }
{ "user1" : 2, "rate1" : 2, "user2" : 3, "rate2" : 1, "movie" : 3 }
like image 62
Neil Lunn Avatar answered Dec 21 '25 02:12

Neil Lunn



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!