Mapper/Reducer 1 --> (key,value)
                      /   |   \
                     /    |    \
     Mapper/Reducer 2     |    Mapper/Reducer 4
     -> (oKey,oValue)     |    -> (xKey, xValue)
                          |
                          |
                    Mapper/Reducer 3
                    -> (aKey, aValue)
I have a logfile, which i aggregate with MR1. The Mapper2, Mapper3, Mapper4 takes the output of MR1 as their input. Jobs are chained.
MR1 Output:
User     {infos of user:[{data here},{more data},{etc}]}
..
MR2 Output:
timestamp       idCount
..
MR3 Output:
timestamp        loginCount
..
MR4 Output:
timestamp        someCount
..
I want to combine the outputs from MR2-4 : Final output->
timestamp     idCount     loginCount   someCount
..
..
..
Is there a way w/o Pig or Hive? I'm using Java.
You can do that with MultipleInputs see sample here
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With