Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use variables in MongoDB Map-reduce map function

Tags:

mongodb

Given a document

{_id:110000, groupings:{A:'AV',B:'BV',C:'CV',D:'DV'},coin:{old:10,new:12}}

My specs call for the specification of attributes for mapping and aggregation at run time, as the groupings the user is interested in are not known up front, but specified by the user at runtime.

For example, one user would specify [A,B] which will cause mapping emissions of

emit( {A:this.groupings.A,B:this.groupings.B},this.coin )

while another would want to specify [A,C] which will cause mapping emissions of

emit( {A:this.groupings.A,C:this.groupings.C},this.coin )

B/c the mapper and reducer functions execute server side, and don't have access to client variables, I haven't been able to come up with a way to use a variable map key in the mapper function.

If I could reference a list of things to group by from the scope of the execution of the map function, this is all very straightforward. However, b/c the mapping function ends up getting these from a different scope, I don't know how to do this, or if it's even possible.

Before I start trying to dynamically build java script to execute through the driver, does anyone have a better suggestion? Maybe a 'group' function will handle this scenario better?

like image 265
gbegley Avatar asked Sep 01 '11 16:09

gbegley


People also ask

What is the use of map and reduce function in MongoDB?

Map-reduce is a data processing paradigm for condensing large volumes of data into useful aggregated results. To perform map-reduce operations, MongoDB provides the mapReduce database command.

How does MapReduce work in MongoDB?

MapReduce facilitates concurrent processing by splitting petabytes of data into smaller chunks, and processing them in parallel on Hadoop commodity servers. In the end, it aggregates all the data from multiple servers to return a consolidated output back to the application.

What is emit function in MapReduce?

The map function may optionally call emit(key,value) any number of times to create an output document associating key with value .

Which of the following database command is used for MapReduce function?

Which of the following database command is used for mapreduce function? Explanation: For map-reduce operations, MongoDB provides the mapReduce database command.


2 Answers

You can pass global, read-only data into map-reduce functions using the "scope" parameter on the map-reduce command. It's not very well documented, I'm afraid.

like image 26
Dave Griffith Avatar answered Oct 11 '22 23:10

Dave Griffith


As pointed out by @Dave Griffith, you can use the scope parameter of the mapReduce function.

I struggled a bit to figure out how to properly pass it to the function because, as pointed out by others, the documentation is not very detailed. Finally, I realised that mapReduce is expecting 3 params:

  • map function
  • reduce function
  • object with one or more of the params defined in the doc

Eventually, I arrived at the following code in Javascript:

// I define a variable external to my map and to my reduce functions
var KEYS = {STATS: "stats"};

function m() {
    // I use my global variable inside the map function
    emit(KEYS.STATS, 1);
}

function r(key, values) {
    // I use a helper function
    return sumValues(values);
}

// Helper function in the global scope
function sumValues(values) {
    var result = 0;
    values.forEach(function(value) {
        result += value;
    });
    return result;
}

db.something.mapReduce(
    m,
    r,
    {
         out: {inline: 1},
         // I use the scope param to pass in my variables and functions
         scope: {
             KEYS: KEYS,
             sumValues: sumValues // of course, you can pass function objects too
         }
    }
);
like image 146
Rodrigue Avatar answered Oct 12 '22 00:10

Rodrigue