Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I run Spark jobs concurrently in the same AWS EMR cluster ?

Is it possible to submit and run Spark jobs concurrently in the same AWS EMR cluster ? If yes then could you please elaborate ?

like image 675
Kunal Avatar asked May 09 '18 05:05

Kunal


2 Answers

You should use the tag --deploy-mode cluster that will allow you to deploy multiple executions to your cluster. That will make yarn handle the resources and the queues for you.

The full example:

spark-submit \
  --class org.apache.spark.examples.SparkPi \
  --master yarn \
  --deploy-mode cluster \  # can be client for client mode
  --executor-memory 20G \
  --num-executors 50 \
  /path/to/examples.jar \
  1000

More details here.

like image 127
Thiago Baldim Avatar answered Oct 31 '22 20:10

Thiago Baldim


Currently, EMR doesn't support running multiple steps in parallel. As far as I know such experimental feature is already implemented but not released due to some issues.

like image 39
Yuriy Bondaruk Avatar answered Oct 31 '22 18:10

Yuriy Bondaruk