Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Do Mappers and Reducers in Hadoop have to be static classes?

I'm attempting to do something simple in Hadoop and found that when writing mappers and reducers are defined as static everywhere. My task is going to be decomposed into several map parts and one final reduce. What if I'd like to reuse one of my mappers in other job? If I have my mapper class defined as inner static one can I use it in other job? Also non-trivial problems may require many more and complicated mappers, so putting them all in one giant file gets terrible when maintaining.

Is there any way to have mappers and reducers as a regular classes (possibly even in separate jar) than the job itself?

like image 777
grafthez Avatar asked Feb 12 '13 08:02

grafthez


People also ask

Is Mapper class static type?

When declaring mapper and reducer classes as inner classes to another class, they have to be declared static such that they are not dependent on the parent class.

Should a mapper be static?

Mapper as a static method In Java, it's a good practice to mark a pure function as static even if they are private. This will make explicit that the method is independent of any instance even it doesn't guarantee that it's a pure function.

How does mapper and reducer works in Hadoop?

The Hadoop Java programs are consist of Mapper class and Reducer class along with the driver class. Hadoop Mapper is a function or task which is used to process all input records from a file and generate the output which works as input for Reducer. It produces the output by returning new key-value pairs.

What are the parameters of mappers and reducers?

The four basic parameters of a mapper are LongWritable, text, text and IntWritable. The first two represent input parameters and the second two represent intermediate output parameters. What are the four basic parameters of a reducer? The four basic parameters of a reducer are Text, IntWritable, Text, IntWritable.


2 Answers

Is your question whether the class has to be static, may be static, or may be inner, or should be inner?

Hadoop itself needs to be able to instantiate your Mapper or Reducer by reflection, given the class reference/name configured in your Job. This will fail if it is a non-static inner class since an instance can be created only in the context of some other of your classes which presumably Hadoop knows nothing about. (Unless the inner class extends its enclosing class, I suppose.)

So to answer the first question: it should not be non-static, since this almost surely makes it unusable. To answer the second and third: and it can be a static (inner) class.

To me a Mapper or Reducer is plainly a top-level concept and deserves a top-level class. Some like to make them inner static to pair them with a "Runner" class. I don't like this as it is really what subpackages are for. You note another design reason to avoid this. To the fourth question: no, I believe inner classes are not good practice.

Final question: yes the Mapper and Reducer classes can be in a separate JAR file. You tell Hadoop which JAR files contains all of this code, and that's the one it will ship off to workers. The workers don't need your Job. However they need anything that the Mapper and Reducer depends on in their same JAR.

like image 182
Sean Owen Avatar answered Nov 14 '22 23:11

Sean Owen


I feel the above answer is much precise and does satisfy the rationale. Except, I feel that inner classes should be harnessed while creating the map and reduce. IMO, all the code should be at one place.

And generics can be utilised thoughtfully in the single class ensuring there are no typecasting errors.

like image 36
Abhay Dandekar Avatar answered Nov 15 '22 00:11

Abhay Dandekar