Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Hadoop MapReduce: Possible to define two mappers and reducers in one hadoop job class?

I have two separate java classes for doing two different mapreduce jobs. I can run them independently. The input files on which they are operating are the same for both of the jobs. So my question is whether it is possible to define two mappers and two reducers in one java class like

mapper1.class
mapper2.class
reducer1.class
reducer2.class

and then like

job.setMapperClass(mapper1.class);
job.setmapperClass(mapper2.class);
job.setCombinerClass(reducer1);
job.setCombinerClass(reducer2);
job.setReducerClass(reducer1);
job.setReducerClass(reducer2);

Do these set Methods actually override the previous ones or add the new ones? I tried the code, but it executes the only latest given classes which brings me thinking that it overrides. But there must be a way of doing this right?

The reason why I am asking this is I can read the input files only once (one I/O) and then process two map reduce jobs. I also would like to know how I can write the output files into two different folders. At the moment, both jobs are separate and require an input and an output directory.

like image 802
Bob Avatar asked Jun 20 '12 15:06

Bob


2 Answers

You can have multiple mappers, but in one job, you can only have one reducer. And the features you need are MultipleInput, MultipleOutput and GenericWritable.

Using MultipleInput, you can set the mapper and the corresponding inputFormat. Here is my post about how to use it.

Using GenericWritable, you can separate different input classes in the reducer. Here is my post about how to use it.

Using MultipleOutput, you can output different classes in the same reducer.

like image 143
Chun Avatar answered Oct 18 '22 08:10

Chun


You can use the MultipleInputs and MultipleOutputs classes for this, but the output of both mappers will go to both reducers. If the data flows for the two mapper/reducer pairs really are independent of one another then keep them as two separate jobs. By the way, MultipleInputs will run your mappers with out change, but the reducers would have to be modified in order to use MultipleOutputs

like image 29
Chris Gerken Avatar answered Oct 18 '22 10:10

Chris Gerken