I am new in hadoop world and I am trying to learn writing a code with map reduce mindset.
So, I was following the michael-noll tutorial.
One of the challenges, I am facing (besides understanding a new framework) is the amount of terminal tricks this framework uses.
So What does.
$echo "foo foo quux labs foo bar quux" | /home/hduser/mapper.py | sort -k1,1 | /home/hduser/reducer.py
means??? what does echo does??
Also, the output of above code is:
bar 1
foo 3
labs 1
quux 2
Now if i dont have the sort -k1,1 thingy
foo 2
bar 1
labs 1
foo 1
quux 2
What is the effect that sort flag is having? what does -k1,1 means?
Thanks..
Reference: http://www.michael-noll.com/tutorials/writing-an-hadoop-mapreduce-program-in-python/
In Linux, the vertical bar, | is used to redirect output of one command to be the input of another.
The echo command writes the following string to the standard output. So in your case, it is writing foo foo quux labs foo bar quux which is then passed as the input to /home/hduser/mapper.py, whose output is then passed as input to sort, and so on.
sort is a Linux command that sorts text. The -k flag tells it which column to sort by. So the 1,1 tells it to sort starting at column 1, ending at column 1.
Type man sort in your Linux terminal to learn more about the command. I hope this helps!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With