I have a data set looks as
{"BrandId":"a","SessionId":100,"UserName":"tom"}
{"BrandId":"a","SessionId":200,"UserName":"tom"}
{"BrandId":"b","SessionId":300,"UserName":"mike"}
I would like to count distinct session and username group by brandid, the sample sql is like:
select brandid,count_distinct(sessionid),count_distinct(username)
from data
group by brandid
I tried to write Mongo DB, my current code is as following and it does not work. Is there anyway to make it work?
db.logs.aggregate([
{$group:{
_id:{brand:"$BrandId",user:"$UserName",session:"$SessionId"},
count:{$sum:1}}},
{$group:{
_id:"$_id.brand",
users:{$sum:"$_id.user"},
sessions:{$sum:"$_id.session"}
}}
])
for the certain example, the expected count is
{"BrandId:"a","countSession":2,"countUser":1}
{"BrandId:"b","countSession":1,"countUser":1}
if you know SQL, the expect result is as same as the SQL I mentioned.
You can do this by using $addToSet
to accumulate the distinct set of SessionId
and UserName
values during the $group
, and then adding a $project
stage to your pipeline that uses the $size
operator to get the size of each set:
db.logs.aggregate([
{$group: {
_id: '$BrandId',
sessionIds: {$addToSet: '$SessionId'},
userNames: {$addToSet: '$UserName'}
}},
{$project: {
_id: 0,
BrandId: '$_id',
countSession: {$size: '$sessionIds'},
countUser: {$size: '$userNames'}
}}
])
Result:
{
"BrandId" : "b",
"countSession" : 1,
"countUser" : 1
},
{
"BrandId" : "a",
"countSession" : 2,
"countUser" : 1
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With