Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

hadoop use cases in real world [closed]

Tags:

hadoop

Newbie here with Hadoop. Concept wise, it is pretty simple to understand, however, one of the real challenge is how to model the problem to be solved in the map-reduce architecture. Suppose my data contains two parts (all in oracle): 1. Rather static data that doesn't change much 2. Fresh data collected everyday.

and currently the data processing is basically read the fresh data, find and use the corresponding static data (or metadata) and apply some algorithm on it and dump it back to Oracle.

How do I model such application paradigm? Do I save/store the static data as part of distributed cache? What if that data is pretty big?

Basically I am looking for more examples like the following: http://stevekrenzel.com/finding-friends-with-mapreduce

Thanks,

like image 314
step-by-step Avatar asked Jan 25 '13 20:01

step-by-step


3 Answers

Basically the requirement is to do join on two data sets. MapReduce programming requires a different way of thinking than normal programming. Here are some references to join and some other patterns on top of MapReduce

  1. Data-Intensive Text Processing with MapReduce

  2. MapReduce Design Patterns

  3. Section 8.3 in Hadoop - The Definitive Guide

Coming back to join, it can multiple ways based on the amount of data and how the data is. The above references have more about the same.

like image 134
Praveen Sripati Avatar answered Oct 01 '22 15:10

Praveen Sripati


We are collecting real life use cases here : http://hadoopilluminated.com/hadoop_book/Hadoop_Use_Cases.html

we already have good coverage of multiple domains, and will continue to add to it.

(disclaimer : I am a co-author of this free hadoop book)

like image 24
Sujee Maniyam Avatar answered Oct 01 '22 15:10

Sujee Maniyam


I would look at the following article about Map/Reduce patterns, which should give you a nice idea of common algorithms and their translation in the Map/Reduce world.

More generally, I don't think there's a magical formula to translate a problem into a set of Map/Reduce, you have to ask yourself questions that vary from dataset to dataset, looking at existing examples is a good thing, and you should definitely try to implement something on a little toy problem.

Also if you have issues abstracting your problem to a set of Map/Reduce jobs, you could also use for example Hive which works like a relational database with a few tweaks, and generates Map/Reduce jobs for you without having to worry too much about what happens.

like image 42
Charles Menguy Avatar answered Oct 03 '22 15:10

Charles Menguy