Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What are some good resources for studying Hadoop's source code?

Tags:

hadoop

Are there any good resources that would help me study Hadoop's source code? I'm particularly looking for university courses or research papers.

like image 582
Tianyang Li Avatar asked Jun 17 '11 12:06

Tianyang Li


People also ask

How long does it take to learn Hadoop?

Introduction to Apache Hadoop is a 15-week, self-paced course from the Linux Foundation on edX that covers deploying Hadoop in a clustered computing environment, building data lake management architectures, data security and much more.

What is Hadoop used for?

Apache Hadoop is an open source framework that is used to efficiently store and process large datasets ranging in size from gigabytes to petabytes of data. Instead of using one large computer to store and process the data, Hadoop allows clustering multiple computers to analyze massive datasets in parallel more quickly.


2 Answers

Studying Hadoop or MapReduce can be a daunting task if you get your hand dirty at the start.
I followed the schedule as follows :

  1. Start with very basics of MR with code.google.com/edu/parallel/dsd-tutorial.html code.google.com/edu/parallel/mapreduce-tutorial.html
  2. Then go for the first two lectures in www.cs.washington.edu/education/courses/cse490h/08au/lectures.htm A very good course intro to MapReduce and Hadoop.
  3. Read the seminal paper http://research.google.com/archive/mapreduce.html and its improvements in the updated version http://www.cs.washington.edu/education/courses/cse490h/08au/readings/communications200801-dl.pdf
  4. Then go for all the other videos in the U.Washington link given above.
  5. Try youtubing the terms Map reduce and hadoop to find videos by ORielly and Google RoundTable for good overview of the future of Hadoop and MapReduce
  6. Then off to the most important videos -
    Cloudera Videos
    www.cloudera.com/resources/?media=Video
    and
    Google MiniLecture Series
    code.google.com/edu/submissions/mapreduce-minilecture/listing.html

Along with all the Multimedia above we need good written material
Documents:

  1. Architecture diagrams at hadooper.blogspot.com are good to have on your wall
  2. Hadoop: The definitive guide goes more into the nuts and bolts of the whole system where as Hadoop in Action is a good read with lots of teaching examples to learn the concepts of hadoop. Pro Hadoop is not for beginners
  3. pdfs of the documentation from Apache Foundation
    hadoop.apache.org/common/docs/current/
    and hadoop.apache.org/common/docs/stable/
    will help you learn as to how model your problem into a MR solution in order to gain the advantages of Hadoop in total.
  4. HDFS paper by Yahoo! Research is also a good read in order to gain in depth knowledge of hadoop
  5. Subscribe to the User Mailing List of Commons, MapReduce and HDFS in order to know problems, solutions and future solutions.
  6. Try the http://developer.yahoo.com/hadoop/tutorial/module1.html link for beginners to expert path to Hadoop

For Any Queries ...
Contact Apache, Google, Bing, Yahoo!

like image 124
vrdmr Avatar answered Nov 02 '22 21:11

vrdmr


Your question seems overly broad - To get a resource to use while looking at source code you should narrow your focus of what you want to study. This will make it easier for you (and any on SO) to find papers/topics covering that topic.

I've dug into the Hadoop source a few times. Normally with a very specific class I needed to learn about. In these cases an external resource wasn't really needed, and since I had the class name, I just googled for that and found resources.

If I were to start trying to understand the hadoop source at a higher level I'd get the source code and my copy of Hadoop: The Definitive Guide and use that as a reference to understand the higher level connections of the source code.

I won't claim that this would be a perfect solution. H:TDG is at a more technical level than the other hadoop books I have and I find it to be very informative. H:TDG is what I'd start with and as I found areas I wanted to dig into more, I would start searching for those specifically.

like image 41
QuinnG Avatar answered Nov 02 '22 20:11

QuinnG