I was trying to follow the Spark standalone application example described here https://spark.apache.org/docs/latest/quick-start.html#standalone-applications
The example ran fine with the following invocation:
spark-submit --class "SimpleApp" --master local[4] target/scala-2.10/simple-project_2.10-1.0.jar
However, when I tried to introduce some third-party libraries via --jars
, it throws ClassNotFoundException
.
$ spark-submit --jars /home/linpengt/workspace/scala-learn/spark-analysis/target/pack/lib/* \
--class "SimpleApp" --master local[4] target/scala-2.10/simple-project_2.10-1.0.jar
Spark assembly has been built with Hive, including Datanucleus jars on classpath
Exception in thread "main" java.lang.ClassNotFoundException: SimpleApp
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:247)
at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:300)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:55)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Removing the --jars
option and the program runs again (I didn't actually start using those libraries yet). What's the problem here? How should I add the external jars?
According to spark-submit
's --help
, the --jars
option expects a comma-separated list of local jars to include on the driver and executor classpaths.
I think that what's happening here is that /home/linpengt/workspace/scala-learn/spark-analysis/target/pack/lib/*
is expanding into a space-separated list of jars and the second JAR in the list is being treated as the application jar.
One solution is to use your shell to build a comma-separated list of jars; here's a quick way of doing it in bash, based on this answer on StackOverflow (see that answer for more complex approaches that handle filenames that contain spaces):
spark-submit --jars $(echo /dir/of/jars/*.jar | tr ' ' ',') \
--class "SimpleApp" --master local[4] path/to/myApp.jar
Is your SimpleApp class in any specific package? It seems that you need to include the full package name in the command line. So, if the SimpleApp class is located in com.yourcompany.yourpackage, you'd have to submit the Spark job with --class "com.yourcompany.yourpackage.SimpleApp" instead of --class "SimpleApp". I had the same problem and changing the name to the full package and class name fixed it. Hope that helps!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With