I tried to implement an application using hadoop which processes text files.The problem is that I cannot keep the ordering of the input text.Is there any way to choose the hash function?This problem could be easily solved by assigning a partition of the input to each mapper an then send the partition to the reducers.Is this possible with hadoop ?
The base idea of MapReduce is that the order in which things are done is irrelevant. So you cannot (and do not need to) control the order in which:
The only thing you can control is the order in which the values are placed in the iterator that is made available in the reducer. This is done using a construct called "secondary sort".
A simple google action for this term resulted in several points where you can continue. I like this blog post : link
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With