Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

using $regex in mongodb aggregation framework in $group

Consider the following example:

db.article.aggregate(
  { $group : {
      _id : "$author",
      docsPerAuthor : { $sum : 1 },
      viewsPerAuthor : { $sum : "$pageViews" }
  }}
);

This groups by the author field and computes two fields.

I have values for $author = FirstName_LastName. Now instead of grouping by $author, I want to group by all authors who share the same LastName.

I tried $regex to group by all matching strings after the '_'

$author.match(/_[a-zA-Z0-9]+$/)

db.article.aggregate(
  { $group : {
      _id : "$author".match(/_[a-zA-Z0-9]+$/),
      docsPerAuthor : { $sum : 1 },
      viewsPerAuthor : { $sum : "$pageViews" }
  }}
);

also tried the following:

 db.article.aggregate(
  { $group : {
      _id : {$author: {$regex: /_[a-zA-Z0-9]+$/}},
      docsPerAuthor : { $sum : 1 },
      viewsPerAuthor : { $sum : "$pageViews" }
  }}
);
like image 477
user1447121 Avatar asked Feb 09 '13 07:02

user1447121


People also ask

Can we use $and in aggregate MongoDB?

You can use $and with aggregation but you don't have to write it, and is implicit using different filters, in fact you can pipe those filters in case one of them needs a different solution. $match takes { <query> } . That means, it can take $and just fine.

What is $regex in MongoDB?

Definition. $regex. Provides regular expression capabilities for pattern matching strings in queries. MongoDB uses Perl compatible regular expressions (i.e. "PCRE" ) version 8.42 with UTF-8 support.

What does $group do in MongoDB?

The $group stage separates documents into groups according to a "group key". The output is one document for each unique group key. A group key is often a field, or group of fields. The group key can also be the result of an expression.

Which aggregation method is preferred for use by MongoDB?

The pipeline provides efficient data aggregation using native operations within MongoDB, and is the preferred method for data aggregation in MongoDB. The aggregation pipeline can operate on a sharded collection.


2 Answers

Use mapReduce: it is the general form of aggregation. This is how to proceed in mongo shell: Define the map function

var mapFunction = function() {
  var key = this.author.match(/_[a-zA-Z0-9]+$/)[0];
  var nb_match_bar2 = 0;
  if( this.bar.match(/bar2/g) ){
    nb_match_bar2 = 1;
  }
  var value = {
    docsPerAuthor: 1,
    viewsPerAuthor: Array.sum(this.pageViews)
  };

  emit( key, value );
};

and the reduce function

var reduceFunction = function(key, values) {

  var reducedObject = {
    _id: key,
    docsPerAuthor: 0,
    viewsPerAuthor: 0
  };

  values.forEach( function(value) {
    reducedObject.docsPerAuthor += value.docsPerAuthor;
    reducedObject.viewsPerAuthor += value.viewsPerAuthor;
  }
  );
  return reducedObject;
};

run mapReduce and save the result in map_reduce_result

>db.st.mapReduce(mapFunction, reduceFunction, {out:'map_reduce_result'})

query map_reduce_result to have the result

>db.map_reduce_result.find()
like image 43
innoSPG Avatar answered Sep 29 '22 15:09

innoSPG


Actually there is no such method which provides this kind of functionality or i could not find the appropriate version which contains it. That will not work with $regexp i think : http://docs.mongodb.org/manual/reference/operator/regex/ it is just for pattern matching.

There is an improvement request in the jira : https://jira.mongodb.org/browse/SERVER-6773

It is in open unresolved state. BUT

in github i found this disscussion: https://github.com/mongodb/mongo/pull/336

And if you check this commit: https://github.com/nleite/mongo/commit/2dd175a5acda86aaad61f5eb9dab83ee19915709

it contains more or less exactly the method you likely to have. I do not really get the point of the state of this improvement: in 2.2.3 it is not working .

like image 181
attish Avatar answered Sep 29 '22 14:09

attish