Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Hadoop Mapper Context object

Tags:

hadoop

How is the run() method of mapper or reducer class called by the Hadoop framework? The framework is calling the run() method, but it requires one context object so how is Hadoop passing that object? What information resides in that object?

like image 506
Jayant Jadhav Avatar asked Oct 21 '22 10:10

Jayant Jadhav


1 Answers

The run() method will be called using the Java Run Time Polymorphism (i.e method overriding). As you can see the line# 569 on the link below, extended mapper/reducer will get instantiated using the Java Reflection APIs. The MapTask class gets the name of extended mapper/reducer from the Job configuration object which the client program would have been configured extended mapper/reducer class using job.setMapperClass()

The following is the code taken from the Hadoop Source MapTask.java

mapperContext = contextConstructor.newInstance(mapper, job, getTaskID(),
                                                  input, output, committer,
                                                  reporter, split);

   input.initialize(split, mapperContext);
   mapper.run(mapperContext);
   input.close();` 

The line# 621 is an example of run time polymorphism. On this line, the MapTask calls the run() method of configured mapper with 'Mapper Context' as parameter. If the run() is not extended, it calls the run() method on the org.apache.hadoop.mapreduce.Mapper which again calls the map() method on configured mapper.

On the line# 616 of the above link, MapTask creates the context object with all the details of job configuration, etc. as mentioned by @harpun and then passes onto the run() method on line # 621.

The above explanation holds good for reduce task as well with appropriate ReduceTask class being the main entry class.

like image 102
Niranjan Sarvi Avatar answered Oct 27 '22 11:10

Niranjan Sarvi