I have records in database that contains URLs. For example, https://www.youtube.com/watch?v=blablabla.
I want to count URLs for each site. For example
[{
site: 'youtube.com',
count: 25
},
{
site: 'facebook.com',
count: 135
}]
I used this aggregation pipeline:
db.getCollection('records').aggregate([
{'$match': {'url': /.*youtube\.com.*/}}, // youtube for example
{'$group': {'_id': {'site': '$url', 'count': {'$sum': 1}}}},
{'$project': {'_id': false, 'site': '$_id.site', 'count': '$_id.count'}}
]);
which outputs:
[{
"site" : "youtube.com/blablabla1",
"count" : 1.0
},
{
"site" : "youtube.com",
"count" : 1.0
},
{
"site" : "www.youtube.com/blablabla2",
"count" : 1.0
},
{
"site" : "www.youtube.com/blablabla1",
"count" : 1.0
}]
It won't even count identical strings correctly.
What is wrong with my approach?
This will count all websites:
Website name is determinated by this regex:
const testData = ['https://www.youtube.com/watch?v=UbQgXeY_zi4&list=RDUbQgXeY_zi4&index=1', 'https://www.facebook.com/maciej.kozieja.9', 'http://example.com', 'http://www.example.com']
const sites = testData.map(site => (site + '/').match(/(?:https?:\/\/)?(?:www\.)?([\w.]+)(?=\/)/)[1])
console.log(sites)
Then we have to use mapReduce function on our colection:
db.collection('links').mapReduce(
function () {
emit((this.site + '/').match(/(?:https?:\/\/)?(?:www\.)?([\w.]+)(?=\/)/)[1], 1)
},
function (key, values) {
return values.length
}, { out: 'websiteLinksCount' }
)
then we can do something with it
.then(x => {
x.find({}).toArray((error, x) => {
console.log(x) // here you have array of [{_id: siteName, value: count}]
})
})
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With