Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Mongo Group and sum with two fields

I have documents like:

{
   "from":"[email protected]",
   "to" :"[email protected]",
   "email": "Hi hello"
}

How can we calculate count of sum "from and to" or "to and from"? Like communication counts between two people?

I am able to calculate one way sum. I want to have sum both ways.

db.test.aggregate([
      { $group: {
         "_id":{ "from": "$from", "to":"$to"},
           "count":{$sum:1} 
         }
      },
      { 
        "$sort" :{"count":-1}
      }
])
like image 230
Somnath Muluk Avatar asked Mar 14 '23 05:03

Somnath Muluk


2 Answers

Since you need to calculate number of emails exchanged between 2 addresses, it would be fair to project a unified between field as following:

db.a.aggregate([
    { $match: {
        to: { $exists: true },
        from: { $exists: true },
        email: { $exists: true }
    }}, 
    { $project: {
        between: { $cond: { 
            if: { $lte: [ { $strcasecmp: [ "$to", "$from" ] }, 0 ] }, 
            then: [ { $toLower: "$to" }, { $toLower: "$from" } ], 
            else: [ { $toLower: "$from" }, { $toLower: "$to" } ] }
        } 
    }},
    { $group: {
         "_id": "$between",
         "count": { $sum: 1 } 
    }},
    { $sort :{ count: -1 } }
])

Unification logic should be quite clear from the example: it is an alphabetically sorted array of both emails. The $match and $toLower parts are optional if you trust your data.

Documentation for operators used in the example:

  • $match
  • $exists
  • $project
  • $cond
  • $lte
  • $strcasecmp
  • $toLower
  • $group
  • $sum
  • $sort
like image 65
Alex Blex Avatar answered Mar 27 '23 04:03

Alex Blex


You basically need to consider the _id for grouping as an "array" of the possible "to" and "from" values, and then of course "sort" them, so that in every document the combination is always in the same order.

Just as a side note, I want to add that "typically" when I am dealing with messaging systems like this, the "to" and "from" sender/recipients are usually both arrays to begin with anyway, so it usally forms the base of where different variations on this statement come from.

First, the most optimal MongoDB 3.2 statement, for single addresses

db.collection.aggregate([
    // Join in array
    { "$project": {
        "people": [ "$to", "$from" ],
    }},

    // Unwind array
    { "$unwind": "$people" },

    // Sort array
    { "$sort": { "_id": 1, "people": 1 } },

    // Group document
    { "$group": {
        "_id": "$_id",
        "people": { "$push": "$people" }
    }},

    // Group people and count
    { "$group": {
        "_id": "$people",
        "count": { "$sum": 1 }
    }}
]);

Thats the basics, and now the only variations are in construction of the "people" array ( stage 1 only above ).

MongoDB 3.x and 2.6.x - Arrays

{ "$project": {
    "people": { "$setUnion": [ "$to", "$from" ] }
}}

MongoDB 3.x and 2.6.x - Fields to array

{ "$project": {
    "people": { 
        "$map": {
            "input": ["A","B"],
            "as": "el",
            "in": {
               "$cond": [
                   { "$eq": [ "A", "$$el" ] },
                   "$to",
                   "$from"
               ]
            }
        }
    }
}}

MongoDB 2.4.x and 2.2.x - from fields

{ "$project": {
    "to": 1,
    "from": 1,
    "type": { "$const": [ "A", "B" ] }
}},
{ "$unwind": "$type" },
{ "$group": {
    "_id": "$_id",
    "people": {
        "$addToSet": {
            "$cond": [
                { "$eq": [ "$type", "A" ] },
                "$to",
                "$from"
            ]
        }
    }
}}

But in all cases:

  1. Get all recipients into a distinct array.

  2. Order the array to a consistent order

  3. Group on the "always in the same order" list of recipients.

Follow that and you cannot go wrong.

like image 24
Blakes Seven Avatar answered Mar 27 '23 03:03

Blakes Seven