Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is it to possible to print Mapper and reducer output for a single job in Hadoop Mapreduce

For a given MR job, i need to produce two output files. One file should be the output of Mapper Another file should be the output of Reducer (which is just an aggregation of above Mapper)

Can I have the both the mapper and reducer output be written in a single job?

EDIT:

In Job 1 (Only Mapper phase) Output contains 20 fields in a single row, which has to be written to hdfs(file1). In Job 2 (Mapper n reducer) Mapper takes input from Job1 output, deletes few fields to bring into a standard format(only 10 fields) and pass it to reducer which writes file2.

I need both file1 and file2 in hdfs... Now My doubt is, whether in Job1 mapper can I write data into hdfs as file1, then modify the same data and pass it to reducer.

PS : As of now I am using 2 jobs with chaining mechanism. First job contains only mapper, seconds job contains mapper and reducer.

like image 466
Abhinay Avatar asked Oct 30 '22 12:10

Abhinay


1 Answers

You could perhaps use the MultipleOutputs class to define one output for the mapper and (optionally) one for the reducer. For the mapper, you will have to write things twice: once for the output file (using MultipleOutputs) and once for emitting pairs to the reducer (as usual).

Then, you could also take advantage of ChainMapper class, to define the following workflow in a single job:

Mapper 1 (file 1) -> Mapper 2 -> Reducer (file 2)

To be honest, I 've never used this logic, but you can give it a try. Good luck!

like image 114
vefthym Avatar answered Nov 15 '22 08:11

vefthym