Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MongoDB - Mongoid map reduce basic operation

I have just started with MongoDB and mongoid. The biggest problem I'm having is understanding the map/reduce functionality to be able to do some very basic grouping and such.

Lets say I have model like this:

class Person
  include Mongoid::Document
  field :age, type: Integer
  field :name
  field :sdate
end

That model would produce objects like these:

#<Person _id: 9xzy0, age: 22, name: "Lucas", sdate: "2013-10-07">
#<Person _id: 9xzy2, age: 32, name: "Paul", sdate: "2013-10-07">
#<Person _id: 9xzy3, age: 23, name: "Tom", sdate: "2013-10-08">
#<Person _id: 9xzy4, age: 11, name: "Joe", sdate: "2013-10-08">

Could someone show how to use mongoid map reduce to get a collection of those objects grouped by the sdate field? And to get the sum of ages of those that share the same sdate field?

I'm aware of this: http://mongoid.org/en/mongoid/docs/querying.html#map_reduce But somehow it would help to see that applied to a real example. Where does that code go, in the model I guess, is a scope needed, etc.

I can make a simple search with mongoid, get the array and manually construct anything I need but I guess map reduce is the way here. And I imagine these js functions mentioned on the mongoid page are feeded to the DB that makes those operations internally. Coming from active record these new concepts are a bit strange.

I'm on Rails 4.0, Ruby 1.9.3, Mongoid 4.0.0, MongoDB 2.4.6 on Heroku (mongolab) though I have locally 2.0 that I should update.

Thanks.

like image 318
Pod Avatar asked Oct 10 '13 11:10

Pod


People also ask

What is map-reduce operation in MongoDB?

Map-reduce is a data processing paradigm for condensing large volumes of data into useful aggregated results. To perform map-reduce operations, MongoDB provides the mapReduce database command.

Does MongoDB support map-reduce programming if yes how?

In MongoDB, map-reduce is a data processing programming model that helps to perform operations on large data sets and produce aggregated results. MongoDB provides the mapReduce() function to perform the map-reduce operations. This function has two main functions, i.e., map function and reduce function.

What is the difference between map-reduce function and aggregate function?

Map-reduce is a common pattern when working with Big Data – it's a way to extract info from a huge dataset. But now, starting with version 2.2, MongoDB includes a new feature called Aggregation framework. Functionality-wise, Aggregation is equivalent to map-reduce but, on paper, it promises to be much faster.

Which feature of MapReduce helps to perform aggregations?

An aggregation pipeline provides better performance and usability than a map-reduce operation. Map-reduce operations can be rewritten using aggregation pipeline operators, such as $group , $merge , and others.


1 Answers

Taking the examples from http://mongoid.org/en/mongoid/docs/querying.html#map_reduce and adapting them to your situation and adding comments to explain.

map = %Q{
  function() {
    emit(this.sdate, { age: this.age, name : this. name }); 
      // here "this" is the record that map
      // is going to be executed on
  }  
}

reduce = %Q{
  function(key, values) {   
           // this will be executed for every group that
           // has the same sdate value
    var result = { avg_of_ages: 0 };
    var sum = 0;    // sum of all ages
    var totalnum = 0  // total number of people
    values.forEach(function(value) {
      sum += value.age;    
    });
    result.avg_of_ages = sum/total   // finding the average
    return result;
  }
}

results = Person.map_reduce(map, reduce) //You can access this as an array of maps

first_average = results[0].avg_of_ages

results.each do |result|
   // do whatever you want with result
end

Though i would suggest you use Aggregation and not map reduce for such a simple operation. The way to do this is as follows :

 results = Person.collection.aggregate([{"$group" => { "_id" => {"sdate" => "$sdate"}, 
                                                "avg_of_ages"=> {"$avg" : "$age"}}}])

and the result will be almost identical with map reduced and you would have written a lot less code.

like image 122
Makis Tsantekidis Avatar answered Sep 28 '22 09:09

Makis Tsantekidis