I am newbie on Hadoop. I remember I learned from somewhere that in Hadoop, all map functions have to be completed before reduce functions can start off.
But I just got the printout when I run a map reduce program like this:
map(15%), reduce(5%)
map(20%), reduce(7%)
map(30%), reduce(10%)
map(38%), reduce(17%)
map(40%), reduce(25%)
why they run in parallel?
Before actual Reduce phase starts, Shuffle, Sort and Merge take place as Mappers keep on completing. This percentage signifies that. It is not the actual Reduce phase. This happens in parallel to reduce the overhead which would otherwise be incurred if framework keeps on waiting for completion of all the Mappers first and then do the Shuffling, Sorting and Merging.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With