Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Spark Job Keep on Running

I've submitted my spark job in ambari-server using following command..

  ./spark-submit --class  customer.core.classname --master yarn --numexecutors 2 --driver-memory 2g --executor-memory 2g --executor-cores 1 /home/hdfs/Test/classname-0.0.1-SNAPSHOT-SNAPSHOT.jar newdata host:6667

and it is working fine...

But how can it will be keep on running like if we close the command prompt or try to kill the job, it must be keep on running.

Any help is appreciated.

like image 665
Mohan.V Avatar asked May 13 '16 05:05

Mohan.V


People also ask

How do I keep a Spark job running?

In general, if you want a process to keep running you can create a process file that will run in the background. in your case, the job will continue running until you specifically kill it using yarn -kill. so even if you kill the spark submit it will continue to run since yarn is managing it after submission.

How do I know if Spark jobs are running?

Click Analytics > Spark Analytics > Open the Spark Application Monitoring Page. Click Monitor > Workloads, and then click the Spark tab. This page displays the user names of the clusters that you are authorized to monitor and the number of applications that are currently running in each cluster.

How do you pause a Spark job?

From the cluster management console, click Spark Instance Groups. Select the Spark instance group whose Spark batch application schedule you want to pause. Click the Applications tab; then Application schedules. Select one or more Spark batch application schedules in the Active state and click Pause.

How do I run a Spark job in cluster mode?

In cluster mode, the Spark driver runs inside an application master process which is managed by YARN on the cluster, and the client can go away after initiating the application. In client mode, the driver runs in the client process, and the application master is only used for requesting resources from YARN.


1 Answers

You can achieve this by couple of ways

1)You can run the spark submit driver process in background using nohup Eg:

nohup  ./spark-submit --class  customer.core.classname \
  --master yarn --numexecutors 2 \
  --driver-memory 2g --executor-memory 2g --executor-cores 1 \
  /home/hdfs/Test/classname-0.0.1-SNAPSHOT-SNAPSHOT.jar \
  newdata host:6667 &

2)Run in deploy mode as cluster so that driver process runs in different node.

like image 166
Vishnu V R Avatar answered Sep 20 '22 08:09

Vishnu V R