Running a Spark application on YARN, without spark-submit

Question

I know that Spark applications can be executed on YARN using spark-submit --master yarn.

The question is: is it possible to run a Spark application on yarn using the yarn command ?

If so, the YARN REST API could be used as interface for running spark and MapReduce applications in a uniform way.

jencoston · Accepted Answer

I see this question is a year old, but to anyone else who stumbles across this question it looks like this should be possible now. I've been trying to do something similar and have been attempting to follow the Starting Spark jobs directly via YARN REST API Tutorial from Hortonworks.

Essentially what you need to do is upload your jar to HDFS, create a Spark Job JSON file per the YARN REST API Documentation, and then use a curl command to start the application. An example of that command is:

curl -s -i -X POST -H "Content-Type: application/json" ${HADOOP_RM}/ws/v1/cluster/apps \
     --data-binary spark-yarn.json

Sietse · Answer

Just like all YARN Applications, Spark implements a Client and an ApplicationMaster when deploying on YARN. If you look at the implementation in the Spark repository, you'll have a clue as to how to create your own Client/ApplicationMaster : https://github.com/apache/spark/tree/master/yarn/src/main/scala/org/apache/spark/deploy/yarn . But out of the box it does not seem possible.

Running a Spark application on YARN, without spark-submit

Tags:

apache-spark

hadoop-yarn

Nicola Ferraro

2 Answers

jencoston

Sietse

Recent Activity

Donate For Us

Running a Spark application on YARN, without spark-submit

Tags:

apache-spark

hadoop-yarn

Nicola Ferraro

2 Answers

jencoston

Sietse

Related questions

Recent Activity

Donate For Us