Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Spark throws ClassNotFoundException when using --jars option

Tags:

apache-spark

I was trying to follow the Spark standalone application example described here https://spark.apache.org/docs/latest/quick-start.html#standalone-applications

The example ran fine with the following invocation:

spark-submit  --class "SimpleApp" --master local[4] target/scala-2.10/simple-project_2.10-1.0.jar

However, when I tried to introduce some third-party libraries via --jars, it throws ClassNotFoundException.

$ spark-submit --jars /home/linpengt/workspace/scala-learn/spark-analysis/target/pack/lib/* \
  --class "SimpleApp" --master local[4] target/scala-2.10/simple-project_2.10-1.0.jar

Spark assembly has been built with Hive, including Datanucleus jars on classpath
Exception in thread "main" java.lang.ClassNotFoundException: SimpleApp
    at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:247)
    at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:300)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:55)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

Removing the --jars option and the program runs again (I didn't actually start using those libraries yet). What's the problem here? How should I add the external jars?

like image 468
chtlp Avatar asked Jul 20 '14 22:07

chtlp


2 Answers

According to spark-submit's --help, the --jars option expects a comma-separated list of local jars to include on the driver and executor classpaths.

I think that what's happening here is that /home/linpengt/workspace/scala-learn/spark-analysis/target/pack/lib/* is expanding into a space-separated list of jars and the second JAR in the list is being treated as the application jar.

One solution is to use your shell to build a comma-separated list of jars; here's a quick way of doing it in bash, based on this answer on StackOverflow (see that answer for more complex approaches that handle filenames that contain spaces):

spark-submit --jars $(echo /dir/of/jars/*.jar | tr ' ' ',') \
    --class "SimpleApp" --master local[4] path/to/myApp.jar
like image 54
Josh Rosen Avatar answered Oct 01 '22 00:10

Josh Rosen


Is your SimpleApp class in any specific package? It seems that you need to include the full package name in the command line. So, if the SimpleApp class is located in com.yourcompany.yourpackage, you'd have to submit the Spark job with --class "com.yourcompany.yourpackage.SimpleApp" instead of --class "SimpleApp". I had the same problem and changing the name to the full package and class name fixed it. Hope that helps!

like image 23
Margaret Katherine McNulty-Bel Avatar answered Sep 30 '22 23:09

Margaret Katherine McNulty-Bel