Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I find out if a task is a reducer or a combiner during run time in Hadoop?

If the operation performed with MapReduce is not commutative and associative, then the combiner cannot be the same as the reducer.

For example when calculating an average value the combiners sums the values for a key and the reducer sums then and then divides the sum by the total number of values for that key. The code of the combiner has only a slight modification. What if you could use the same class for both combiner and reducer and have a peace of code that can determine if the current task is a combiner or a reducer? If it finds out that it is a reducer than it divides the sum by the count.

Something like this:

protected void reduce(Text keyIn, Iterable<PairWritable> valuesIn,
      Context context)
  throws IOException, InterruptedException {
    double sum = 0.0d;
    long count = 0l;

    for (PairWritable valueIn : valuesIn) {
      sum += valueIn.getSum();
      count += valueIn.getCount();
    }

    if (THIS_IS_A_REDUCER) {
      sum /= count;
    }

    context.write(keyIn, new PairWritable(sum, count));
  }

Is it possible to do this? Can the peace of code THIS_IS_A_REDUCER from above be replaced with something?

I can determine if a task is a mapper or a reducer from task attempt ID String, but both combiners and reducers seem to have similar string patterns.

like image 260
Calin-Andrei Burloiu Avatar asked Feb 20 '23 12:02

Calin-Andrei Burloiu


1 Answers

This is a flawed question. Whenever you found there is a need to differentiate which reduce() a task calls. Add a combiner. For example, You write

public static class Combine extends MapReduceBase implements Reducer<Text, Text, Text, Text> {
public void reduce(Text key, Iterator<Text> message, OutputCollector<Text, Text> output, Reporter reporter) throws IOException {}

public static class Reduce extends MapReduceBase implements Reducer<Text, Text, Text, Text> {
public void reduce(Text key, Iterator<Text> message, OutputCollector<Text, Text> output, Reporter reporter) throws IOException {}

In the main(), you write

conf.setReducerClass(Reduce.class);
conf.setCombinerClass(Combine.class);
like image 138
hsh Avatar answered Apr 27 '23 11:04

hsh