Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

mongo-aggregation: apply regex grouping, string processing on $project

I'd like to apply some simple String manipulation when doing $project, is it possible to apply something like the following function on $project? :

var themeIdFromZipUrl = function(zipUrl){
    return zipUrl.match(/.*\/(T\d+)\/.*/)[1]
};

I'm using the following query:

db.clientRequest.aggregate(
{
$match: {
  "l": {$regex: ".*zip"},
  "t": { "$gte": new Date('1/SEP/2013'),
                    "$lte": new Date('7/OCT/2013')
                    }
  }
},
{
  $project: {"theme_url" : "$l", "_id": 0, "time": "$t"}
},
{
  $group: {   _id: {
                      theme_url: "$theme_url",
                      day: {
                              "day": {$dayOfMonth : "$time"},
                              "month": {$month: "$time"},
                              "year": {$year: "$time"}
                            },
              },
              count: {$sum:1}
  }
}

)

This returns following:

        {
        "_id" : {
            "theme_url" : "content/theme/T70/zip",
            "day" : {
                "day" : 13,
                "month" : 9,
                "year" : 2013
            }
        },
        "count" : 2
    }

Can I apply the function above on the theme_url field and turn it to theme_id? I took a little look on Map-Reduce, but I'm not sure whether it's a bit too complicated for such an easy case.

Thanks,

Amit.

like image 809
amit Avatar asked Dec 05 '25 03:12

amit


1 Answers

There's no way to do this using the Aggregation Framework currently.

You could do it with MapReduce but that would probably slow down the entire thing (if the amount of data is large).

If this is the last step of the aggregation you can also do it on the clientside after the aggregation completes. e.g. in the Mongo shell:

var aggregationResults = col.aggregate([ /* aggregation pipeline here */]);
aggregationResults.results.forEach(function(x) { 
  x._id.theme_id = themeIdFromUrl(x._id.themeUrl);
});

If you're using a driver for another language you'll have to do this in whatever language you're using, of course.

Generally speaking, if your data contains a theme_url and the theme_id is encoded in the URL, it might make sense to store it in its own field. Mongo is not a very good tool for text manipulation.

like image 63
Avish Avatar answered Dec 07 '25 15:12

Avish



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!