Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

spark-submit continues to hang after job completion

I am trying to test spark 1.6 with hdfs in AWS. I am using the wordcount python example available in the examples folder. I submit the job with spark-submit, the job completes successfully and its prints the results on the console as well. The web-UI also says its completed. However the spark-submit never terminates. I have verified that the context is stopped in the word count example code as well.

What could be wrong ?

This is what I see on the console.

6-05-24 14:58:04,749 INFO  [Thread-3] handler.ContextHandler (ContextHandler.java:doStop(843)) - stopped o.s.j.s.ServletContextHandler{/stages/stage,null}
2016-05-24 14:58:04,749 INFO  [Thread-3] handler.ContextHandler (ContextHandler.java:doStop(843)) - stopped o.s.j.s.ServletContextHandler{/stages/json,null}
2016-05-24 14:58:04,749 INFO  [Thread-3] handler.ContextHandler (ContextHandler.java:doStop(843)) - stopped o.s.j.s.ServletContextHandler{/stages,null}
2016-05-24 14:58:04,749 INFO  [Thread-3] handler.ContextHandler (ContextHandler.java:doStop(843)) - stopped o.s.j.s.ServletContextHandler{/jobs/job/json,null}
2016-05-24 14:58:04,750 INFO  [Thread-3] handler.ContextHandler (ContextHandler.java:doStop(843)) - stopped o.s.j.s.ServletContextHandler{/jobs/job,null}
2016-05-24 14:58:04,750 INFO  [Thread-3] handler.ContextHandler (ContextHandler.java:doStop(843)) - stopped o.s.j.s.ServletContextHandler{/jobs/json,null}
2016-05-24 14:58:04,750 INFO  [Thread-3] handler.ContextHandler (ContextHandler.java:doStop(843)) - stopped o.s.j.s.ServletContextHandler{/jobs,null}
2016-05-24 14:58:04,802 INFO  [Thread-3] ui.SparkUI (Logging.scala:logInfo(58)) - Stopped Spark web UI at http://172.30.2.239:4040
2016-05-24 14:58:04,805 INFO  [Thread-3] cluster.SparkDeploySchedulerBackend (Logging.scala:logInfo(58)) - Shutting down all executors
2016-05-24 14:58:04,805 INFO  [dispatcher-event-loop-2] cluster.SparkDeploySchedulerBackend (Logging.scala:logInfo(58)) - Asking each executor to shut down
2016-05-24 14:58:04,814 INFO  [dispatcher-event-loop-5] spark.MapOutputTrackerMasterEndpoint (Logging.scala:logInfo(58)) - MapOutputTrackerMasterEndpoint stopped!
2016-05-24 14:58:04,818 INFO  [Thread-3] storage.MemoryStore (Logging.scala:logInfo(58)) - MemoryStore cleared
2016-05-24 14:58:04,818 INFO  [Thread-3] storage.BlockManager (Logging.scala:logInfo(58)) - BlockManager stopped
2016-05-24 14:58:04,820 INFO  [Thread-3] storage.BlockManagerMaster (Logging.scala:logInfo(58)) - BlockManagerMaster stopped
2016-05-24 14:58:04,821 INFO  [dispatcher-event-loop-3] scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint (Logging.scala:logInfo(58)) - OutputCommitCoordinator stopped!
2016-05-24 14:58:04,824 INFO  [Thread-3] spark.SparkContext (Logging.scala:logInfo(58)) - Successfully stopped SparkContext
2016-05-24 14:58:04,827 INFO  [sparkDriverActorSystem-akka.actor.default-dispatcher-2] remote.RemoteActorRefProvider$RemotingTerminator (Slf4jLogger.scala:apply$mcV$sp(74)) - Shutting down remote daemon.
2016-05-24 14:58:04,828 INFO  [sparkDriverActorSystem-akka.actor.default-dispatcher-2] remote.RemoteActorRefProvider$RemotingTerminator (Slf4jLogger.scala:apply$mcV$sp(74)) - Remote daemon shut down; proceeding with flushing remote transports.
2016-05-24 14:58:04,843 INFO  [sparkDriverActorSystem-akka.actor.default-dispatcher-2] remote.RemoteActorRefProvider$RemotingTerminator (Slf4jLogger.scala:apply$mcV$sp(74)) - Remoting shut down.

I have to do a ctrl-c to terminate the spark-submit process. This is really a weird problem and I have no idea how to fix this. Please let me know if there are any logs I should be looking at, or doing things differently here.

Here is the pastebin link of the jstack output of spark-submit process: http://pastebin.com/Nfnt4XmT

like image 408
Pradeep Nayak Avatar asked May 24 '16 19:05

Pradeep Nayak


People also ask

How do I run a spark job in the background?

In general, if you want a process to keep running you can create a process file that will run in the background. in your case, the job will continue running until you specifically kill it using yarn -kill. so even if you kill the spark submit it will continue to run since yarn is managing it after submission.

How do you pause a spark job?

From the cluster management console, click Spark Instance Groups. Select the Spark instance group whose Spark batch application schedule you want to pause. Click the Applications tab; then Application schedules. Select one or more Spark batch application schedules in the Active state and click Pause.

How do I know if my spark job is progressing?

You can view the status of a Spark Application that is created for the notebook in the status widget on the notebook panel. The widget also displays links to the Spark UI, Driver Logs, and Kernel Log. Additionally, you can view the progress of the Spark job when you run the code.

How do I know if my spark job failed?

When a Spark job or application fails, you can use the Spark logs to analyze the failures. The QDS UI provides links to the logs in the Application UI and Spark Application UI. If you are running the Spark job or application from the Analyze page, you can access the logs via the Application UI and Spark Application UI.


1 Answers

I had the same issue with custom thread pool in my spark job code. I found out that spark-submit hangs with using custom non daemonic thread pools in your code. You could check ThreadUtils.newDaemonCachedThreadPool() to understand how spark developers create thread pools or you could use this utils but be careful as they are package private.

like image 199
Eugene Lopatkin Avatar answered Oct 13 '22 00:10

Eugene Lopatkin