Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Elastic Map Reduce: difference between CANCEL_AND_WAIT and CONTINUE?

I just found that using Amazon's Elastic Map Reduce, I can specify a step to have one of three ActionOnFailure choices:

  • TERMINATE_JOB_FLOW
  • CANCEL_AND_WAIT
  • CONTINUE

TERMINATE_JOB_FLOW is the default and obvious - it shuts down the entire cluster upon a failure in the step.

What is the difference between CANCEL_AND_WAIT and CONTINUE? It appears to me that both will keep the cluster running and simply move on to the next step when it is added.

like image 267
Suman Avatar asked Mar 07 '13 21:03

Suman


People also ask

How is Amazon Elastic MapReduce different from a traditional database?

How is Amazon's Elastic Map Reduce (EMR) different from a traditional database? O Queries are run in real time O Big data is stored in large object tables O Queries are dynamic O It applies the schema at the time of the query​ See what the community says and unlock a badge.

What is difference between EC2 and EMR?

Amazon EC2 is a cloud based service which gives customers access to a varying range of compute instances, or virtual machines. Amazon EMR is a managed big data service which provides pre-configured compute clusters of Apache Spark, Apache Hive, Apache HBase, Apache Flink, Apache Hudi, and Presto.

What is Amazon AMR?

Amazon EMR is the industry-leading cloud big data solution for petabyte-scale data processing, interactive analytics, and machine learning using open-source frameworks such as Apache Spark, Apache Hive, and Presto.


1 Answers

Say you have launched a cluster and added following 3 steps to it:

  • Step1
  • Step2
  • Step3

Now, if Step1 has ActionOnFailure as CANCEL_AND_WAIT, then in the event on failure of Step1, it would cancel all the remaining steps and the cluster will get into a Waiting status. And I guess if you laucng your cluster with --stay-alive option then this is the default behaviour.

if Step1 has ActionOnFailure as CONTINUE, then in the event on failure of Step1, it would continue with the execution of Step2.

if Step1 has ActionOnFailure as TERMINATE_JOB_FLOW, then in the event on failure of Step1, it would shut down the cluster as you mentioned.

like image 165
Amar Avatar answered Nov 02 '22 14:11

Amar