Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Select Group by count and distinct count in same mongodb query

I am trying to do something like

select campaign_id,campaign_name,count(subscriber_id),count(distinct subscriber_id)
group by campaign_id,campaign_name from campaigns;

This query giving results except count(distinct subscriber_id)

db.campaigns.aggregate([
    {$match: {subscriber_id: {$ne: null}}},
    {$group: { 
        _id: {campaign_id: "$campaign_id",campaign_name: "$campaign_name"},
        count: {$sum: 1}
    }}
])

This following query giving results except count(subscriber_id)

db.campaigns_logs.aggregate([
    {$match : {subscriber_id: {$ne: null}}},
    {$group : { _id: {campaign_id: "$campaign_id",campaign_name: "$campaign_name",subscriber_id: "$subscriber_id"}}},
    {$group : { _id: {campaign_id: "$campaign_id",campaign_name: "$campaign_name"}, 
                count: {$sum: 1}
              }}
])

but I want count(subscriber_id),count(distinct subscriber_id) in the same result

like image 726
Rams Avatar asked Jul 15 '14 14:07

Rams


2 Answers

You are beginning to think along the right lines here as you were headed in the right direction. Changing your SQL mindset, "distinct" is really just another way of writing a $group operation in either language. That means you have two group operations happening here and, in aggregation pipeline terms, two pipeline stages.

Just with simplified documents to visualize:

{
    "campaign_id": "A",
    "campaign_name": "A",
    "subscriber_id": "123"
},
{
    "campaign_id": "A",
    "campaign_name": "A",
    "subscriber_id": "123"
},
{
    "campaign_id": "A",
    "campaign_name": "A",
    "subscriber_id": "456"
}

It stands to reason that for the given "campaign" combination the total count and "distinct" count are "3" and "2" respectively. So the logical thing to do is "group" up all of those "subscriber_id" values first and keep the count of occurrences for each, then while thinking "pipeline", "total" those counts per "campaign" and then just count the "distinct" occurrences as a separate number:

db.campaigns.aggregate([
    { "$match": { "subscriber_id": { "$ne": null }}},

    // Count all occurrences
    { "$group": {
        "_id": {
            "campaign_id": "$campaign_id",
            "campaign_name": "$campaign_name",
            "subscriber_id": "$subscriber_id"
        },
        "count": { "$sum": 1 }
    }},

    // Sum all occurrences and count distinct
    { "$group": {
        "_id": {
            "campaign_id": "$_id.campaign_id",
            "campaign_name": "$_id.campaign_name"
        },
        "totalCount": { "$sum": "$count" },
        "distinctCount": { "$sum": 1 }
    }}
])

After the first "group" the output documents can be visualized like this:

{ 
    "_id" : { 
        "campaign_id" : "A", 
        "campaign_name" : "A", 
        "subscriber_id" : "456"
    }, 
    "count" : 1 
}
{ 
    "_id" : { 
        "campaign_id" : "A", 
        "campaign_name" : "A", 
        "subscriber_id" : "123"
    }, 
    "count" : 2
}

So from the "three" documents in the sample, "2" belong to one distinct value and "1" to another. This can still be totaled with $sum in order to get the total matching documents which you do in the following stage, with the final result:

{ 
    "_id" : { 
        "campaign_id" : "A", 
        "campaign_name" : "A"
    },
    "totalCount" : 3,
    "distinctCount" : 2
}

A really good analogy for the aggregation pipeline is the unix pipe "|" operator, which allows "chaining" of operations so you can pass the output of one command through to the input of the next, and so on. Starting to think of your processing requirements in that way will help you understand operations with the aggregation pipeline better.

like image 92
Neil Lunn Avatar answered Oct 17 '22 17:10

Neil Lunn


SQL Query: (group by & count of distinct)

select city,count(distinct(emailId)) from TransactionDetails group by city;

The equivalent mongo query would look like this:

db.TransactionDetails.aggregate([ 
{$group:{_id:{"CITY" : "$cityName"},uniqueCount: {$addToSet: "$emailId"}}},
{$project:{"CITY":1,uniqueCustomerCount:{$size:"$uniqueCount"}} } 
]);
like image 22
Surendranath Reddy K Avatar answered Oct 17 '22 18:10

Surendranath Reddy K