hadoop use cases in real world [closed]

Question

Newbie here with Hadoop. Concept wise, it is pretty simple to understand, however, one of the real challenge is how to model the problem to be solved in the map-reduce architecture. Suppose my data contains two parts (all in oracle): 1. Rather static data that doesn't change much 2. Fresh data collected everyday.

and currently the data processing is basically read the fresh data, find and use the corresponding static data (or metadata) and apply some algorithm on it and dump it back to Oracle.

How do I model such application paradigm? Do I save/store the static data as part of distributed cache? What if that data is pretty big?

Basically I am looking for more examples like the following: http://stevekrenzel.com/finding-friends-with-mapreduce

Thanks,

Praveen Sripati · Accepted Answer

Basically the requirement is to do join on two data sets. MapReduce programming requires a different way of thinking than normal programming. Here are some references to join and some other patterns on top of MapReduce

Data-Intensive Text Processing with MapReduce
MapReduce Design Patterns
Section 8.3 in Hadoop - The Definitive Guide

Coming back to join, it can multiple ways based on the amount of data and how the data is. The above references have more about the same.

Sujee Maniyam · Answer

We are collecting real life use cases here : http://hadoopilluminated.com/hadoop_book/Hadoop_Use_Cases.html

we already have good coverage of multiple domains, and will continue to add to it.

(disclaimer : I am a co-author of this free hadoop book)

Charles Menguy · Answer

I would look at the following article about Map/Reduce patterns, which should give you a nice idea of common algorithms and their translation in the Map/Reduce world.

More generally, I don't think there's a magical formula to translate a problem into a set of Map/Reduce, you have to ask yourself questions that vary from dataset to dataset, looking at existing examples is a good thing, and you should definitely try to implement something on a little toy problem.

Also if you have issues abstracting your problem to a set of Map/Reduce jobs, you could also use for example Hive which works like a relational database with a few tweaks, and generates Map/Reduce jobs for you without having to worry too much about what happens.

hadoop use cases in real world [closed]

Tags:

hadoop

step-by-step

3 Answers

Praveen Sripati

Sujee Maniyam

Charles Menguy

Recent Activity

Donate For Us

hadoop use cases in real world [closed]

Tags:

hadoop

step-by-step

3 Answers

Praveen Sripati

Sujee Maniyam

Charles Menguy

Related questions

Recent Activity

Donate For Us