Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Mongodb aggregate, group and count instances

I have a document which looks like this:

{
    "_id" : ObjectId("527a6b7c24a8874c078b9d10"),
    "day" : 6,
    "hour" : 15,
    "hourlyLocations" : [
        {
            "countryName" : "Spain",
            "countryCode" : "ES",
            "cityName" : "Madrid",
            "latitude" : 40,
            "longitude" : -4
        },
        {
            "countryName" : "United Kingdom",
            "countryCode" : "GB",
            "cityName" : "Soest",
            "latitude" : 51.5,
            "longitude" : -0.13
        }
    ],
    "minute" : 18,
    "month" : 11,
    "year" : 2013
}

"hourlyLocations" is a series of embedded documents (just two shown here for brevity).

I'm trying to run an aggregation which will return each country, all the cities in that country (once) and the number of instances of each city.

Here's what I've got so far:

db.hourly.aggregate(
[
    { "$project" : { "hourly" : "$hourlyLocations" } },
    { "$unwind" : "$hourly" },
    { "$group" : { "_id" : { "country" : "$hourly.countryName" }, "city" : { "$push" : "$hourly.cityName" } } },
]
)

This returns something like:

{
        "_id" : {
            "country" : "Italy"
        },
        "city" : [
            "Manzano",
            "Cologno Monzese",
            "Rome",
            "Manzano",
            "Cologno Monzese",
            "Venice",
            "Milan",
            "Rome",
            "Milan",
            "Manzano",
            "Cologno Monzese",
            "Venice",
            "Milan",
            "Rome",
            "Milan",
            "Manzano",
            "Cologno Monzese",
            "Venice",
            "Milan",
            "Rome",
            "Manzano",
            "Cologno Monzese",
            "Venice",
            "Milan",
            "Casalnuovo di Napoli",
            "Manzano",
            "Cologno Monzese",
            "Venice",
            "Milan",
            "Casalnuovo di Napoli",
            "Milan"
        ]
    }

So I've got all the instances of all the cities grouped by city. What I want to do now is to group by, and count, the number of instances of each city. Something like this:

{
        "_id" : {
            "country" : "Italy"
        },
        "city" : [
            "Casalnuovo di Napoli" : "12"
            "Cologno Monzese" : "10",
            "Manzano" : "9",
            "Milan" : "6",
            "Rome" : "3",
            "Venice" : "1"
        ]
    }

I've tried a few things but haven't been able to get it right. How can I get the count of each city per country as I require?

Many thanks,

Nick.

like image 873
dev-null Avatar asked Nov 11 '13 14:11

dev-null


1 Answers

Try:

db.hourly.aggregate(
[
    { "$project" : { "hourly" : "$hourlyLocations" } },
    { "$unwind" : "$hourly" },
    { $group: { _id: { country: "$hourly.countryName", city: "$hourly.cityName" }, count: { $sum: 1 } } },
    { $sort: { count: -1 } },
    {  $group: { _id: "$_id.country", cities: { $push: { city: "$_id.city", count: "$count"  } }  } }
]
)

It's not quite the requested structure. Instead you get:

{
    "_id" : {
        "country" : "Italy"
    },
    "cities" : [
        { "city": "Cologno Monzese", "count": 12},
        { "city": "Milan", "count": 6},
        { "city": "Rome", "count": 3},
    ]
}
like image 161
rvidal Avatar answered Oct 20 '22 19:10

rvidal