How do I submit more than one job to Hadoop in a step using the Elastic MapReduce API?

Tags:

Amazon EMR Documentation to add steps to cluster says that a single Elastic MapReduce step can submit several jobs to Hadoop. However, Amazon EMR Documentation for Step configuration suggests that a single step can accommodate just one execution of hadoop-streaming.jar (that is, HadoopJarStep is a HadoopJarStepConfig rather than an array of HadoopJarStepConfigs).

What is the proper syntax for submitting several jobs to Hadoop in a step?

996

asked Jun 14 '14 10:06

verve

1 Answers

Like Amazon EMR Documentation says, you can create a cluster to run some script my_script.sh on the master instance in a step:

aws emr create-cluster --name "Test cluster" --ami-version 3.11 --use-default-roles
    --ec2-attributes KeyName=myKey --instance-type m3.xlarge --instance count 3
    --steps Type=CUSTOM_JAR,Name=CustomJAR,ActionOnFailure=CONTINUE,Jar=s3://elasticmapreduce/libs/script-runner/script-runner.jar,Args=["s3://mybucket/script-path/my_script.sh"]

my_script.sh should look something like this:

#!/usr/bin/env bash

hadoop jar my_first_step.jar [mainClass] args... &
hadoop jar my_second_step.jar [mainClass] args... &
.
.
.
wait

This way, multiple jobs are submitted to Hadoop in the same step---but unfortunately, the EMR interface won't be able to track them. To do this, you should use the Hadoop web interfaces as shown here, or simply ssh to the master instance and explore with mapred job.

110

answered Sep 27 '22 17:09

verve

Related questions
                            
                                Checksum verification in Hadoop
                            
                                copyFromLocal: unexpected URISyntaxException
                            
                                Apache Hive How to round off to 2 decimal places?
                            
                                Spark 1.6-Failed to locate the winutils binary in the hadoop binary path
                            
                                How to get file size
                            
                                Mapper input Key-Value pair in Hadoop
                            
                                Hadoop 2.2.0 : "name or service not known" Warning
                            
                                How to get ID of a map task in Spark?
                            
                                hadoop fs -du gives two data columns
                            
                                org.apache.hadoop.mapred.FileAlreadyExistsException
                            
                                error in namenode starting
                            
                                Hadoop YARN: Get a list of available queues
                            
                                How to connect to Hadoop/Hive from .NET
                            
                                Hive ParseException - cannot recognize input near 'end' 'string'
                            
                                How do you retrieve the replication factor info in Hdfs files?
                            
                                What is the difference between single node & pseudo-distributed mode in Hadoop?
                            
                                How to open/stream .zip files through Spark?
                            
                                How to output multiple s3 files in Parquet
                            
                                Unable to load native hadoop library for Mac OS X
                            
                                Define tuple datas in the pig script

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How do I submit more than one job to Hadoop in a step using the Elastic MapReduce API?

Tags:

amazon-web-services

hadoop

emr

hadoop-streaming

verve

People also ask

1 Answers

verve

Recent Activity

Donate For Us