Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Looking for overall review on Hadoop

I am looking for some performance review on Hadoop (300-600 boxes cluster, commodity hardware), especially on the following aspects:

  1. High concurrent read & write
  2. Web crawling
  3. Mapreduce, parallel computing
  4. Inverted index
like image 523
Mickey Shine Avatar asked Nov 05 '22 20:11

Mickey Shine


1 Answers

This is not a specific question, maybe that is why nobody answered until now. Performance on 3-600 nodes cluster can be best analyzed with benchmarks.

However, I found some really interesting articles regarding Hadoop and its implementations in production:

  • Hadoop Architecture and its Usage at Facebook
  • How Rackspace Now Uses MapReduce And Hadoop To Query Terabytes Of Data
  • Some benchmarks are found in the article Hadoop Sorts a Petabyte in 16.25 Hours and a Terabyte in 62 Seconds
  • Also, a really interesting blog related to Hadoop
  • Another article related to facebook and hadoop is Hive - A Petabyte Scale Data Warehouse using Hadoop

I hope those links will get you started and give you all the info you need.

like image 91
Tudor Constantin Avatar answered Nov 09 '22 13:11

Tudor Constantin