Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Reducer starts before mapper has finished

I am running a Map Reduce Program. However I am getting similar output even though I am running it with only mapper or both with mapper and reducer.

After this it never completes.It hangs up there on.

I am not getting why reducer is getting started before mapper has finished 100%? What might be potential problems?

Output:

Map 10% Reduce 0%
Map 19% Reduce 0%
Map 21% Reduce 0%
Map 39% Reduce 0%
Map 49% Reduce 0%
Map 63% Reduce 0% 
Map 67% Reduce 0% 
Map 68% Reduce 0% 
Map 68% Reduce 22%
Map 69% Reduce 22%

Here is a mapper code:

public class EntityCountMapper extends Mapper<LongWritable, Text, Text, IntWritable> {
  static String total_record="";

  @Override
  protected void map(LongWritable baseAddress, Text line, Context context)
        throws IOException, InterruptedException {

    Text entity=new Text();
    IntWritable one=new IntWritable(1);

    total_record=total_record.concat(line.toString());
    String[] fields=total_record.split("::");
    if(fields.length==24)
    {
        entity.set(fields[22].trim());          
        context.write(entity,one);
        total_record="";
    }       
  }
}
like image 479
Gaurav Gandhi Avatar asked Jun 17 '15 14:06

Gaurav Gandhi


People also ask

Can reducer starts before mapper?

This can happen while mappers are generating data since it is only a data transfer. On the other hand, sort and reduce can only start once all the mappers are done.

Can reducers begin processing before mapper phase is complete Why or why not?

Reduce can be started as soon as there is enough data for it to start - e.g. two nodes completed their map job.

Why reducer runs after all map tasks are finished?

Because they "hog up" reduce slots while only copying data and waiting for mappers to finish. Another job that starts later that will actually use the reduce slots now can't use them. You can customize when the reducers startup by changing the default value of mapred. reduce.

Is it possible to start reducers While some mappers are still running?

Reduce: A reducer cannot start while a mapper is still in progress. Worker nodes process each group of <key,value> pairs output data, in parallel to produce <key,value> pairs as output. All the map output values that have the same key are assigned to a single reducer, which then aggregates the values for that key.


1 Answers

The reduce phase has 3 steps: shuffle, sort, reduce. Shuffle is where the data is collected by the reducer from each mapper. This can happen while mappers are generating data since it is only a data transfer. On the other hand, sort and reduce can only start once all the mappers are done. You can tell which one MapReduce is doing by looking at the reducer completion percentage: 0-33% means its doing shuffle, 34-66% is sort, 67%-100% is reduce. This is why your reducers will sometimes seem "stuck" at 33%-- it's waiting for mappers to finish.

like image 93
RAMKESH MEENA Avatar answered Oct 19 '22 23:10

RAMKESH MEENA