Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Stopping a Running Spark Application

Tags:

apache-spark

I'm running a Spark cluster in standalone mode.

I've submitted a Spark application in cluster mode using options:

--deploy-mode cluster –supervise 

So that the job is fault tolerant.

Now I need to keep the cluster running but stop the application from running.

Things I have tried:

  • Stopping the cluster and restarting it. But the application resumes execution when I do that.
  • Used Kill -9 of a daemon named DriverWrapper but the job resumes again after that.
  • I've also removed temporary files and directories and restarted the cluster but the job resumes again.

So the running application is really fault tolerant.

Question: Based on the above scenario can someone suggest how I can stop the job from running or what else I can try to stop the application from running but keep the cluster running.

Something just accrued to me, if I call sparkContext.stop() that should do it but that requires a bit of work in the code which is OK but can you suggest any other way without code change.

like image 253
jakstack Avatar asked May 07 '15 06:05

jakstack


2 Answers

If you wish to kill an application that is failing repeatedly, you may do so through:

./bin/spark-class org.apache.spark.deploy.Client kill <master url> <driver ID>

You can find the driver ID through the standalone Master web UI at http://:8080.

From Spark Doc

like image 142
yjshen Avatar answered Oct 04 '22 02:10

yjshen


Revisiting this because I wasn't able to use the existing answer without debugging a few things.

My goal was to programmatically kill a driver that runs persistently once a day, deploy any updates to the code, then restart it. So I won't know ahead of time what my driver ID is. It took me some time to figure out that you can only kill the drivers if you submitted your driver with the --deploy-mode cluster option. It also took me some time to realize that there was a difference between application ID and driver ID, and while you can easily correlate an application name with an application ID, I have yet to find a way to divine the driver ID through their api endpoints and correlate that to either an application name or the class you are running. So while run-class org.apache.spark.deploy.Client kill <master url> <driver ID> works, you need to make sure you are deploying your driver in cluster mode and are using the driver ID and not the application ID.

Additionally, there is a submission endpoint that spark provides by default at http://<spark master>:6066/v1/submissions and you can use http://<spark master>:6066/v1/submissions/kill/<driver ID> to kill your driver.

Since I wasn't able to find the driver ID that correlated to a specific job from any api endpoint, I wrote a python web scraper to get the info from the basic spark master web page at port 8080 then kill it using the endpoint at port 6066. I'd prefer to get this data in a supported way, but this is the best solution I could find.

#!/usr/bin/python

import sys, re, requests, json
from selenium import webdriver

classes_to_kill = sys.argv
spark_master = 'masterurl'

driver = webdriver.PhantomJS()
driver.get("http://" + spark_master + ":8080/")

for running_driver in driver.find_elements_by_xpath("//*/div/h4[contains(text(), 'Running Drivers')]"):
    for driver_id in running_driver.find_elements_by_xpath("..//table/tbody/tr/td[contains(text(), 'driver-')]"):
        for class_to_kill in classes_to_kill:
            right_class = driver_id.find_elements_by_xpath("../td[text()='" + class_to_kill + "']")
            if len(right_class) > 0:
                driver_to_kill = re.search('^driver-\S+', driver_id.text).group(0)
                print "Killing " + driver_to_kill
                result = requests.post("http://" + spark_master + ":6066/v1/submissions/kill/" + driver_to_kill)
                print json.dumps(json.loads(result.text), indent=4)

driver.quit()
like image 38
Nathan Loyer Avatar answered Oct 04 '22 02:10

Nathan Loyer