Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using stored JavaScript functions in the Aggregation pipeline, MapReduce or runCommand

Is there a way to use a user-defined function saved as db.system.js.save(...) in pipeline or mapreduce?

like image 632
Prakash Thapa Avatar asked Jul 18 '15 23:07

Prakash Thapa


People also ask

Can mapReduce be used for aggregation?

Map-reduce operations can be rewritten using aggregation pipeline operators, such as $group , $merge , and others. For map-reduce operations that require custom functionality, MongoDB provides the $accumulator and $function aggregation operators starting in version 4.4.

What is difference between mapReduce and aggregation?

Map-reduce is a common pattern when working with Big Data – it's a way to extract info from a huge dataset. But now, starting with version 2.2, MongoDB includes a new feature called Aggregation framework. Functionality-wise, Aggregation is equivalent to map-reduce but, on paper, it promises to be much faster.

Which functionality is used for aggregation framework?

8. Which of the following functionality is used for aggregation framework? Explanation: For related projection functionality in the aggregation framework pipeline, use the $project pipeline stage.

What are aggregation pipeline stages?

Writes the resulting documents of the aggregation pipeline to a collection. The stage can incorporate (insert new documents, merge documents, replace documents, keep existing documents, fail the operation, process documents with a custom update pipeline) the results into an output collection.


1 Answers

Any function you save to system.js is available for usage by "JavaScript" processing statements such as the $where operator and mapReduce and can be referenced by the _id value is was asssigned.

db.system.js.save({ 
   "_id": "squareThis", 
   "value": function(a) { return a*a } 
})

And some data inserted to "sample" collection:

{ "_id" : ObjectId("55aafd2bacbed38e06f9eccf"), "a" : 1 }
{ "_id" : ObjectId("55aafea6acbed38e06f9ecd0"), "a" : 2 }
{ "_id" : ObjectId("55aafeabacbed38e06f9ecd1"), "a" : 3 }

Then:

db.sample.mapReduce(
    function() {
       emit(null, squareThis(this.a));
    },
    function(key,values) {
        return Array.sum(values);
    },
    { "out": { "inline": 1 } }
 );

Gives:

   "results" : [
            {
                    "_id" : null,
                    "value" : 14
            }
    ],

Or with $where:

db.sample.find(function() { return squareThis(this.a) == 9 })
{ "_id" : ObjectId("55aafeabacbed38e06f9ecd1"), "a" : 3 }

But in "neither" case can you use globals such as the database db reference or other functions. Both $where and mapReduce documentation contain information of the limits of what you can do here. So if you thought you were going to do something like "look up data in another collection", then you can forget it because it is "Not Allowed".

Every MongoDB command action is actually a call to a "runCommand" action "under the hood" anyway. But unless what that command is actually doing is "calling a JavaScript processing engine" then the usage becomes irrelevant. There are only a few commands anyway that do this, being mapReduce, group or eval, and of course the find operations with $where.


The aggregation framework does not use JavaScript in any way at all. You might be mistaking just as others have done a statement like this, which does not do what you think it does:

db.sample.aggregate([
    { "$match": {
        "a": { "$in": db.sample.distinct("a") }
    }}
])

So that is "not running inside" the aggregation pipeline, but rather the "result" of that .distinct() call is "evaluated" before the pipeline is sent to the server. Much as with an external variable is done anyway:

var items = [1,2,3];
db.sample.aggregate([
    { "$match": {
        "a": { "$in": items }
    }}
])

Both essentially send to the server in the same way:

db.sample.aggregate([
    { "$match": {
        "a": { "$in": [1,2,3] }
    }}
])

So it is "not possible" to "call" any JavaScript function in the aggregation pipeline, nor is there really any point is "passing in" results in general from something saved in system.js. The "code" needs to be "loaded to the client" and only a JavaScript engine can actually do anything with it.

With the aggregation framework, all of the "operators" available are actually natively coded functions as opposed to the "free form" JavaScript interpretation provided for mapReduce. So instead of writing "JavaScript", you use the operators themselves:

db.sample.aggregate([
    { "$group": {
        "_id": null,
        "sqared": { "$sum": {
           "$multiply": [ "$a", "$a" ]
        }}
    }}
])

{ "_id" : null, "sqared" : 14 }

So there are limitations on what you can do with functions saved in system.js, and the chances are that what you want to do is either:

  • Not allowed, such as accessing data from another collection
  • Not really required as the logic is generally self contained anyway
  • Or probably better implemented in client logic or other different form anyway

Just about the only practical use I can really think of is that you have a number of "mapReduce" operations that cannot be done any other way and you have various "shared" functions that you would rather just store on the server than maintain within every mapReduce function call.

But then again, the 90% reason for mapReduce over the aggregation framework is usually that the "document structure" of the collections has been poorly chosen and the JavaScript functionality is "required" to traverse the document for search and analysis.

So you can use it under the allowed constraints, but in most cases you probably should not be using this at all, but fixing the other issues that caused you to believe you needed this feature in the first place.

like image 186
Blakes Seven Avatar answered Nov 08 '22 19:11

Blakes Seven