I had two conceptual doubts related to mapreduce and hadoop.I know a simple one iteration map-reduce program, know what a mapper,reducer, shuffler is.. But still want to know about the following questions
1)when is iterative map reduce done?
2)i know identity mapper/reducer gives same output as the fed input. But when do we use an identity mapper/reducer?
1) An example of an iterative MR algorithm is Dijkstra's shortest path algorithm. At each iteration the nearest neighbours of all active nodes are explored, the reduce phase is used to check if the destination node is already reached. Other examples are Facebook's friends of friends (FoF) algorithm to find to suggest new friends.
2) An identity mapper is used can be used (among others!) if you would only want to sort your input. An identity reducer can be used for example to implement embarrasingly parallel algorithms where you just use the mappers to perform the parallel tasks but you want the output key value pairs to be sorted.
Hope this got you on your way.
Note that apart from identity reducer you can also have NO reducer set (then the map output is not sorted).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With