Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Submit & Kill Spark Application program programmatically from another application

Tags:

apache-spark

I am wondering if it is possible to submit, monitor & kill spark applications from another service.

My requirements are as follows:

I wrote a service that

  1. parse user commands
  2. translate them into understandable arguments to an already prepared Spark-SQL application
  3. submit the application along with arguments to Spark Cluster using spark-submit from ProcessBuilder
  4. And plans to run generated applications' driver in cluster mode.

Other requirements needs:

  • Query about the applications status, for example, the percentage remains
  • Kill queries accrodingly

What I find in spark standalone documentation suggest kill application using:

./bin/spark-class org.apache.spark.deploy.Client kill <master url> <driver ID>

And should find the driver ID through the standalone Master web UI at http://<master url>:8080.

So, what am I supposed to do?

Related SO questions:
Spark application finished callback
Deploy Apache Spark application from another application in Java, best practice

like image 280
yjshen Avatar asked May 01 '15 15:05

yjshen


1 Answers

You could use shell script to do this.

The deploy script:

#!/bin/bash

spark-submit --class "xx.xx.xx" \       
        --deploy-mode cluster \
        --supervise \
        --executor-memory 6G hdfs:///spark-stat.jar > output 2>&1

cat output

and you will get output like this:

16/06/23 08:37:21 INFO rest.RestSubmissionClient: Submitting a request to launch an application in spark://node-1:6066.
16/06/23 08:37:22 INFO rest.RestSubmissionClient: Submission successfully created as driver-20160623083722-0026. Polling submission state...
16/06/23 08:37:22 INFO rest.RestSubmissionClient: Submitting a request for the status of submission driver-20160623083722-0026 in spark://node-1:6066.
16/06/23 08:37:22 INFO rest.RestSubmissionClient: State of driver driver-20160623083722-0026 is now RUNNING.
16/06/23 08:37:22 INFO rest.RestSubmissionClient: Driver is running on worker worker-20160621162532-192.168.1.200-7078 at 192.168.1.200:7078.
16/06/23 08:37:22 INFO rest.RestSubmissionClient: Server responded with CreateSubmissionResponse:
{
  "action" : "CreateSubmissionResponse",
  "message" : "Driver successfully submitted as driver-20160623083722-0026",
  "serverSparkVersion" : "1.6.0",
  "submissionId" : "driver-20160623083722-0026",
  "success" : true
}

And based on this, create your kill driver script

#!/bin/bash

driverid=`cat output | grep submissionId | grep -Po 'driver-\d+-\d+'`

spark-submit --master spark://node-1:6066 --kill $driverid

Make sure given the script execute permission by using chmod +x

like image 192
pinkdawn Avatar answered Sep 29 '22 07:09

pinkdawn