Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the difference between job.submit and job.waitForComplete in Apache Hadoop?

I have read the documentation so I know the difference.

My question however is that, is there any risk in using .submit instead of .waitForComplete if I want to run several Hadoop jobs on a cluster in parallel ?

I mostly use Elastic Map Reduce.

When I tried doing so, I noticed that only the first job being executed.

like image 259
Eastern Monk Avatar asked May 22 '13 21:05

Eastern Monk


1 Answers

If your aim is to run jobs in parallel then there is certainly no risk in using job.submit(). The main reason job.waitForCompletion exists is that it's method call returns only when the job gets finished, and it returns with it's success or failure status which can be used to determine that further steps are to be run or not.

Now, getting back at you seeing only the first job being executed, this is because by default Hadoop schedules the jobs in FIFO order. You certainly can change this behaviour. Read more here.

like image 188
Amar Avatar answered Oct 04 '22 02:10

Amar