What is the purpose of the org.apache.hadoop.mapreduce.Mapper.run() function in Hadoop?

Tags:

What is the purpose of the org.apache.hadoop.mapreduce.Mapper.run() function in Hadoop? The setup() is called before calling the map() and the clean() is called after the map(). The documentation for the run() says

Expert users can override this method for more complete control over the execution of the Mapper.

I am looking for the practical purpose of this function.

947

asked Sep 18 '11 06:09

Praveen Sripati

2 Answers

The default run() method simply takes each key / value pair supplied by the context and calls the map() method:

public void run(Context context) throws IOException, InterruptedException {
    setup(context);
    while (context.nextKeyValue()) {
       map(context.getCurrentKey(), context.getCurrentValue(), context);
    }
    cleanup(context);
}

If you wanted to do more than that ... you'd need to override it. For example, the MultithreadedMapper class

121

answered Sep 23 '22 10:09

Brian Roach

I just came up with a fairly odd case for using this.

Occasionally I've found that I want a mapper that consumes all its input before producing any output. I've done it in the past by performing the record writes in my cleanup function. My map function doesn't actually output any records, it just reads the input and stores whatever will be needed in private structures.

It turns out that this approach works fine unless you're producing a LOT of output. The best I can make out is that the mapper's spill facility doesn't operate during cleanup. So the records that are produced just keep accumulating in memory, and if there are too many of them you risk heap exhaustion. This is my speculation of what's going on - could be wrong. But definitely the problem goes away with my new approach.

That new approach is to override run() instead of cleanup(). My only change to the default run() is that after the last record has been delivered to map(), I call map() once more with null key and value. That's a signal to my map() function to go ahead and produce its output. In this case, with the spill facility still operable, memory usage stays in check.

answered Sep 26 '22 10:09

Andy Lowry

Related questions
                            
                                How to round to specific values in Python
                            
                                Priority of static functions in C
                            
                                Code-Golf: one line PHP syntax
                            
                                Can a function be invoked in a bash subshell as background job?
                            
                                Mathematica, calling functions from another notebook
                            
                                How to handle nested default parameters with object destructuring?
                            
                                warning: return makes pointer from integer without a cast but returns integer as desired
                            
                                Special characters in MySql Password [closed]
                            
                                What does the following warning mean: 'side-effecting nullary methods are discouraged'?
                            
                                Overwrite and restore a function
                            
                                How to get JavaScript function data into a PHP variable
                            
                                R: using a list for ellipsis arguments
                            
                                What are the implications of using def vs. val for constant values
                            
                                Why constructor is used instead of functions?
                            
                                Practical difference between def f(x: Int) = x+1 and val f = (x: Int) => x+1 in Scala
                            
                                How to write a function pointer to a function returning a function pointer to a function?
                            
                                What's the difference these two function calling conventions?
                            
                                javaScript function - why my default argument fails?
                            
                                Type hinting for functions in Clojure
                            
                                calling c function from C#

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What is the purpose of the org.apache.hadoop.mapreduce.Mapper.run() function in Hadoop?

Tags:

function

map

hadoop

Praveen Sripati

People also ask

2 Answers

Brian Roach

Andy Lowry

Recent Activity

Donate For Us