Lately, i have reading a lot about MapReduce/Hadoop and think this is where industry is currently moving to. I want to start learning MapReduce/Hadoop and i thought the best way to start would be to implement some small project. However, i tried to do some googling, but couldnt find anything.
Can you guys give me some links or may be some books that can give me a practical introduction to this technology. May be a small project that i can implement on my own to get a better understanding of the technology.
Thanks, Chander
It is the most popular and most active Apache data processing project. It is written in the Skala programming language, while enabling API for programming languages Python, Scala, Java, R and SQL.
The Algorithm MapReduce program executes in three stages, namely map stage, shuffle stage, and reduce stage.
Cloudera (releases a Hadoop distribution) has some fantastic online training videos as well as a virtual machine with everything setup and able to run through examples from the online free training http://www.cloudera.com/resources/?type=Training
The most common examples that get thrown around are creating an inverted index, and implementing grep.
If you're looking for more information:
A really friendly introduction can be found here. The original paper is here.
And what looks like some good example code to get you going is here.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With