I'm writing a bash script that runs an aws emr command (aws emr version 1.5.2).
How do I tell my script to wait until the emr job finishes before proceeding? The --wait-for-steps
option is depreciated now.
via jq I got this, but it just seems like the wrong approach:
STEP_STATUS_STATE=$(aws emr list-steps --cluster-id ${CLUSTER_ID} | jq '.Steps[0].Status.State' | tr -d '"')
while [[ ${STEP_STATUS_STATE} == PENDING ]] || [[ ${STEP_STATUS_STATE} == RUNNING ]]; do
STEP_STATUS_STATE=$(aws emr list-steps --cluster-id ${CLUSTER_ID} | jq '.Steps[0].Status.State' | tr -d '"')
echo $(date) ${STEP_STATUS_STATE}
sleep 10
done
Start the EMR Cluster Once you've finished configuring your cluster, you can start it with the Create Cluster button. This may take a while (~10 mins), depending on your settings.
On average, our clusters required 8–10 minutes to bootstrap and source the Spot Instances requested.
Add steps to a running clusterOn the Cluster List page, select the link for your cluster. On the Cluster Details page, choose the Steps tab. On the Steps tab, choose Add step. Type appropriate values in the fields in the Add Step dialog, and then choose Add.
Each EMR step is a unit of work that contains instructions to manipulate data for processing by software installed on the cluster, including tools such as Apache Spark, Hive, or Presto.
I use AWS java api to wait till job is finished like below. Hope this helps
public static final List<JobFlowExecutionState> DONE_STATES = Arrays
.asList(new JobFlowExecutionState[] {
JobFlowExecutionState.COMPLETED,
JobFlowExecutionState.FAILED,
JobFlowExecutionState.TERMINATED });
...
public static boolean isDone(String value) {
JobFlowExecutionState state = JobFlowExecutionState.fromValue(value);
return Constants.DONE_STATES.contains(state);
}
.
.
STATUS_LOOP: while (true) {
DescribeJobFlowsRequest desc = new DescribeJobFlowsRequest(
Arrays.asList(new String[] { result.getJobFlowId() }));
DescribeJobFlowsResult descResult = emr.describeJobFlows(desc);
for (JobFlowDetail detail : descResult.getJobFlows()) {
String state = detail.getExecutionStatusDetail().getState();
if (isDone(state)) {
logger.info("Job " + state + ": " + detail.toString());
if(loadToDailyDB && state.equalsIgnoreCase("COMPLETED"))
{
//Do something
}
if(!state.equalsIgnoreCase("COMPLETED"))
{
}
break STATUS_LOOP;
} else if (!lastState.equals(state)) {
lastState = state;
logger.info("Job " + state + " at "
+ new Date().toString());
}
}
Thread.sleep(75000);
Try using AWS emr wait step-complete option. Have a reference here.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With