I store our web server logs in MongoDB and the schema looks similar to as follows:
[
{
"_id" : 12345,
"url" : "http://www.mydomain.com/xyz/abc.html",
....
},
....
]
I am trying to use the $project
operator to reshape this schema a little bit before I start passing my collection through an aggregation pipeline. Basically, I need to add a new field called "type" that will later be used to perform group-by. The logic for the new field is pretty simple.
if "url" contains "pattern_A" then set "type" = "sales lead";
else if "url" contains "pattern_B" then set "type" = "existing client";
...
I'm thinking it would have to be something like this:
db.weblog.aggregate(
{
$project : {
type : { /* how to implement the logic??? */ }
}
}
);
I know how to do this using map-reduce (by setting the "keyf" attribute to a custom JS function that implements the above logic) but am now trying to use the new aggregation framework to do this. I tried to implement the logic using the expression operators but so far couldn't get it to work. Any help/suggestion would be greatly appreciated!
The $project takes a document that can specify the inclusion of fields, the suppression of the _id field, the addition of new fields, and the resetting of the values of existing fields. Alternatively, you may specify the exclusion of fields. Specifies the inclusion of a field.
$group is used to group input documents by the specified _id expression and for each distinct grouping, outputs a document. $project is used to pass along the documents with the requested fields to the next stage in the pipeline.
You can include one or more $set stages in an aggregation operation. To add field or fields to embedded documents (including documents in arrays) use the dot notation.
What is Aggregation in MongoDB? Aggregation is a way of processing a large number of documents in a collection by means of passing them through different stages. The stages make up what is known as a pipeline. The stages in a pipeline can filter, sort, group, reshape and modify documents that pass through the pipeline.
I am sharing my "solution" in case others encounter the same needs like mine.
After researching for a couple of weeks, as @asya-kamsky suggested in one of his comments, I've decided to add a computed field to my original MongoDB schema. It's not ideal because whenever the logic for the computed field changes I would have to do bulk updates to update all documents in my collection but it was either that or rewrite my code to use MapReduce. I chose the former for now. In looking at MongoDB Jira board, it would appear that many people have asked for more diverse operators to be added for the $project operator and I certainly hope that the MongoDB dev team gets around to adding them sooner than later
Operator for splitting string based on a separator.
New projection operator $elemMatch
Allow $slice operator in $project
add a $inOrder operator to $project
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With