Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What runs first: the partitioner or the combiner?

Tags:

hadoop

I was wondering between partitioner and combiner, which runs first?

I was of the opinion it is the partitiner first and then combiner and then the keys are redirected to different reducers, which appears like the partitioner, and so I'm confused. Please help me understand.

like image 477
user2345694 Avatar asked Feb 27 '14 06:02

user2345694


1 Answers

Partition comes first.

According to "Hadoop, the definitive guide", output of Mapper first writen to memory buffer, then spilled to local dir when buffer is about to overflow. The spilling data is parted according to Partitioner, and in each partition the result is sorted and combined if Combiner given.

You can simply modify the wordcount MR program to verify it. My result is: ("the quick brown fox jumped over a lazy dog")


Word, Step, Time

fox, Mapper, **********754

fox, Partitioner, **********754

fox, Combiner, **********850

fox, Reducer, **********904


Obviously, Combiner runs after Partitioner.

like image 137
Mike Song Avatar answered Oct 02 '22 19:10

Mike Song