Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Hadoop - how to use and reduce multiple inputs?

Mapper/Reducer 1 --> (key,value)
                      /   |   \
                     /    |    \
     Mapper/Reducer 2     |    Mapper/Reducer 4
     -> (oKey,oValue)     |    -> (xKey, xValue)
                          |
                          |
                    Mapper/Reducer 3
                    -> (aKey, aValue)

I have a logfile, which i aggregate with MR1. The Mapper2, Mapper3, Mapper4 takes the output of MR1 as their input. Jobs are chained.

MR1 Output:

User     {infos of user:[{data here},{more data},{etc}]}
..

MR2 Output:

timestamp       idCount
..

MR3 Output:

timestamp        loginCount
..

MR4 Output:

timestamp        someCount
..

I want to combine the outputs from MR2-4 : Final output->

timestamp     idCount     loginCount   someCount
..
..
..

Is there a way w/o Pig or Hive? I'm using Java.

like image 990
JustTheAverageGirl Avatar asked Nov 03 '22 00:11

JustTheAverageGirl


1 Answers

You can do that with MultipleInputs see sample here

like image 160
Arnon Rotem-Gal-Oz Avatar answered Nov 14 '22 22:11

Arnon Rotem-Gal-Oz