Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why map and reduce run at the same time?

I am newbie on Hadoop. I remember I learned from somewhere that in Hadoop, all map functions have to be completed before reduce functions can start off.

But I just got the printout when I run a map reduce program like this:

map(15%), reduce(5%)
map(20%), reduce(7%)
map(30%), reduce(10%)
map(38%), reduce(17%)
map(40%), reduce(25%)

why they run in parallel?

like image 511
gywlily Avatar asked Sep 13 '13 19:09

gywlily


1 Answers

Before actual Reduce phase starts, Shuffle, Sort and Merge take place as Mappers keep on completing. This percentage signifies that. It is not the actual Reduce phase. This happens in parallel to reduce the overhead which would otherwise be incurred if framework keeps on waiting for completion of all the Mappers first and then do the Shuffling, Sorting and Merging.

like image 90
Tariq Avatar answered Sep 22 '22 21:09

Tariq