Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Chaining multiple mapreduce tasks in Hadoop streaming

I am in scenario where I have two mapreduce jobs. I am more comfortable with python and planning to use it for writing mapreduce scripts and use hadoop streaming for the same. is there a convenient to chain both the jobs following form when hadoop streaming is used?

Map1 -> Reduce1 -> Map2 -> Reduce2

I've heard a lot of methods to accomplish this in java, But i need something for Hadoop streaming.

like image 269
Varadharajan Mukundan Avatar asked Jan 07 '11 14:01

Varadharajan Mukundan


People also ask

What is the preferred file format when chaining multiple MapReduce jobs?

MapReduce is a computation abstraction that works well with The Hadoop Distributed File System (HDFS).

Can you provide multiple input paths to a MapReduce jobs?

We use MultipleInputs class which supports MapReduce jobs that have multiple input paths with a different InputFormat and Mapper for each path.

Is there any limit to number of multistep MapReduce jobs?

You can submit as many jobs you want, they will be queued up and scheduler will run them based on FIFO(by default) and available resources.

What is job chaining explain the data flow of job chaining?

Job chaining is a term in MapReduce that refers to launching several steps in the same MapReduce task. With job chaining, the first job sends output to one job, which sends output to the next job in the chain, and so on until the job is complete. It is a form of pipelining MapReduce jobs to make them more manageable.


1 Answers

Here is a great blog post on how to use Cascading and Streaming. http://www.xcombinator.com/2009/11/18/how-to-use-cascading-with-hadoop-streaming/

The value here is you can mix java (Cascading query flows) with your custom streaming operations in the same app. I find this much less brittle than other methods.

Note, the Cascade object in Cascading allows you to chain multiple Flows (via the above blog post your Streaming job would become a MapReduceFlow).

Disclaimer: I'm the author of Cascading

like image 95
cwensel Avatar answered Oct 02 '22 19:10

cwensel