Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to give dependent jars to spark submit in cluster mode

I am running spark using cluster mode for deployment . Below is the command

JARS=$JARS_HOME/amqp-client-3.5.3.jar,$JARS_HOME/nscala-time_2.10-2.0.0.jar,\
$JARS_HOME/rabbitmq-0.1.0-RELEASE.jar,\
$JARS_HOME/kafka_2.10-0.8.2.1.jar,$JARS_HOME/kafka-clients-0.8.2.1.jar,\
$JARS_HOME/spark-streaming-kafka_2.10-1.4.1.jar,\
$JARS_HOME/zkclient-0.3.jar,$JARS_HOME/protobuf-java-2.4.0a.jar

dse spark-submit -v --conf "spark.serializer=org.apache.spark.serializer.KryoSerializer" \
 --executor-memory 512M \
 --total-executor-cores 3 \
 --deploy-mode "cluster" \
 --master spark://$MASTER:7077 \
 --jars=$JARS \
 --supervise \
 --class "com.testclass" $APP_JAR  input.json \
 --files "/home/test/input.json"

The above command is working fine in client mode. But when I use it in cluster mode I get class not found exception

Exception in thread "main" java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.spark.deploy.worker.DriverWrapper$.main(DriverWrapper.scala:58)
    at org.apache.spark.deploy.worker.DriverWrapper.main(DriverWrapper.scala)
Caused by: java.lang.NoClassDefFoundError: org/apache/spark/streaming/kafka/KafkaUtils$

In client mode the dependent jars are getting copied to the /var/lib/spark/work directory whereas in cluster mode it is not. Please help me in getting this solved.

EDIT:

I am using nfs and I have mounted the same directory on all the spark nodes under same name. Still I get the error. How it is able to pick the application jar which is also under same directory but not the dependent jars ?

like image 553
Knight71 Avatar asked Mar 14 '23 21:03

Knight71


1 Answers

In client mode the dependent jars are getting copied to the /var/lib/spark/work directory whereas in cluster mode it is not.

In Cluster mode, driver pragram is running in the cluster not in local(compared to client mode) and dependent jars should be accessible in cluster, otherwise driver program and executor will throw "java.lang.NoClassDefFoundError" exception.

Actually When using spark-submit, the application jar along with any jars included with the --jars option will be automatically transferred to the cluster.

Your extra jars could be added to --jars, they will be copied to cluster automatically.

please refer to "Advanced Dependency Management" section in below link:
http://spark.apache.org/docs/latest/submitting-applications.html

like image 193
Shawn Guo Avatar answered May 12 '23 17:05

Shawn Guo