Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Amazon EMR --wait-for-steps

I'm writing a bash script that runs an aws emr command (aws emr version 1.5.2).

How do I tell my script to wait until the emr job finishes before proceeding? The --wait-for-steps option is depreciated now.

via jq I got this, but it just seems like the wrong approach:

STEP_STATUS_STATE=$(aws emr list-steps --cluster-id ${CLUSTER_ID} | jq '.Steps[0].Status.State' | tr -d '"')
while [[ ${STEP_STATUS_STATE} == PENDING ]] || [[ ${STEP_STATUS_STATE} == RUNNING ]]; do
    STEP_STATUS_STATE=$(aws emr list-steps --cluster-id ${CLUSTER_ID} | jq '.Steps[0].Status.State' | tr -d '"')
    echo $(date) ${STEP_STATUS_STATE}
    sleep 10
done
like image 964
dranxo Avatar asked Nov 04 '14 23:11

dranxo


People also ask

How long does an EMR cluster take to start?

Start the EMR Cluster Once you've finished configuring your cluster, you can start it with the Create Cluster button. This may take a while (~10 mins), depending on your settings.

How long does it take to spin up an EMR cluster?

On average, our clusters required 8–10 minutes to bootstrap and source the Spot Instances requested.

How do you add steps to EMR?

Add steps to a running clusterOn the Cluster List page, select the link for your cluster. On the Cluster Details page, choose the Steps tab. On the Steps tab, choose Add step. Type appropriate values in the fields in the Add Step dialog, and then choose Add.

What is EMR step function?

Each EMR step is a unit of work that contains instructions to manipulate data for processing by software installed on the cluster, including tools such as Apache Spark, Hive, or Presto.


2 Answers

I use AWS java api to wait till job is finished like below. Hope this helps

 public static final List<JobFlowExecutionState> DONE_STATES = Arrays
        .asList(new JobFlowExecutionState[] {
                JobFlowExecutionState.COMPLETED,
                JobFlowExecutionState.FAILED,
                JobFlowExecutionState.TERMINATED });

...

  public static boolean isDone(String value) {
    JobFlowExecutionState state = JobFlowExecutionState.fromValue(value);
    return Constants.DONE_STATES.contains(state);
}

   .
   .
  STATUS_LOOP: while (true) {
        DescribeJobFlowsRequest desc = new DescribeJobFlowsRequest(
                Arrays.asList(new String[] { result.getJobFlowId() }));
        DescribeJobFlowsResult descResult = emr.describeJobFlows(desc);
        for (JobFlowDetail detail : descResult.getJobFlows()) {
            String state = detail.getExecutionStatusDetail().getState();
            if (isDone(state)) {
                logger.info("Job " + state + ": " + detail.toString());

                if(loadToDailyDB && state.equalsIgnoreCase("COMPLETED"))
                {

                    //Do something
                }
                if(!state.equalsIgnoreCase("COMPLETED"))
                {

                }

                break STATUS_LOOP;
            } else if (!lastState.equals(state)) {
                lastState = state;
                logger.info("Job " + state + " at "
                        + new Date().toString());
            }
        }
        Thread.sleep(75000);
like image 94
Sandesh Deshmane Avatar answered Sep 20 '22 11:09

Sandesh Deshmane


Try using AWS emr wait step-complete option. Have a reference here.

like image 44
Dhawal shah Avatar answered Sep 18 '22 11:09

Dhawal shah