I was wondering between partitioner and combiner, which runs first?
I was of the opinion it is the partitiner first and then combiner and then the keys are redirected to different reducers, which appears like the partitioner, and so I'm confused. Please help me understand.
Partition comes first.
According to "Hadoop, the definitive guide", output of Mapper first writen to memory buffer, then spilled to local dir when buffer is about to overflow. The spilling data is parted according to Partitioner, and in each partition the result is sorted and combined if Combiner given.
You can simply modify the wordcount MR program to verify it. My result is: ("the quick brown fox jumped over a lazy dog")
Word, Step, Time
fox, Mapper, **********754
fox, Partitioner, **********754
fox, Combiner, **********850
fox, Reducer, **********904
Obviously, Combiner runs after Partitioner.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With