Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Map Reduce Keep input ordering

I tried to implement an application using hadoop which processes text files.The problem is that I cannot keep the ordering of the input text.Is there any way to choose the hash function?This problem could be easily solved by assigning a partition of the input to each mapper an then send the partition to the reducers.Is this possible with hadoop ?

like image 645
nikosdi Avatar asked Nov 04 '22 11:11

nikosdi


1 Answers

The base idea of MapReduce is that the order in which things are done is irrelevant. So you cannot (and do not need to) control the order in which:

  • the input records go through the mappers.
  • the key and related values go through the reducers.

The only thing you can control is the order in which the values are placed in the iterator that is made available in the reducer. This is done using a construct called "secondary sort".

A simple google action for this term resulted in several points where you can continue. I like this blog post : link

like image 199
Niels Basjes Avatar answered Nov 09 '22 12:11

Niels Basjes