Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Spark Java Appilcation : java.lang.ClassNotFoundException

Tags:

apache-spark

I've created a Apache Spark application using Java. All it does is counting the lines containing the "spark" word 1000 times.

Here's my code:

public class Example1 {
    public static void main(String[] args) {
        String logfile = args[0];
        try{
            SparkConf conf = new SparkConf();
            conf.setAppName("Sample");
            conf.setMaster("spark://<master>:7077");
            conf.set("spark.executor.memory", "1g");
            JavaSparkContext sc = new JavaSparkContext(conf);
            JavaRDD<String> logData = sc.textFile(logfile).cache();
            long count = 0;
            for(int i=0; i<=1000; i++){
                count += logData.filter(new Function<String, Boolean>(){
                    public Boolean call(String s){
                        if (s.toLowerCase().contains("spark"))
                            return true;
                        else
                            return false;
                    }
                }).count();
            }
        }
        catch(Exception ex){
            System.out.println(ex.getMessage());
        }
    }
}

When I perform a debug in Eclipse IDE, I am encountering java.lang.ClassNotFoundException:

WARN scheduler.TaskSetManager: Loss was due to java.lang.ClassNotFoundException
java.lang.ClassNotFoundException: org.spark.java.examples.Example1$1
at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
at java.lang.ClassLoader.loadClass(ClassLoader.java:323)
at java.lang.ClassLoader.loadClass(ClassLoader.java:268)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:270)

I also tried to deploy this inside the cluster using spark-submit, but still, the same exception was encountered. Here's a portion of the stacktrace:

ERROR Executor: Exception in task ID 4
java.lang.ClassNotFoundException: org.spark.java.examples.Example1$1
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.Class.forName0(Native Method)

Any ideas on how to resolve this? Thanks in advance!

like image 221
jaysonpryde Avatar asked Jun 13 '14 13:06

jaysonpryde


3 Answers

You need to deliver the jar with your job to the workers. To do that, have maven build a jar and add that jar to the context:

 conf.setJars(new String[]{"path/to/jar/Sample.jar"}); [*]

For a 'real' job you would need to build a jar with dependencies (check Maven shade plugin), but for a simple job with no external dependencies, a simple jar is sufficient.

[*] I'm not very familiar with the Spark java API, just assuming it should be something like this.

like image 165
maasg Avatar answered Oct 15 '22 18:10

maasg


You must include your jar in the worker's classpath. You can do this in two ways:

  • Using the SparkContext method addJar (you can review the documentation in this page http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.SparkContext)
  • Adding the jar in the lib directory in your spark distribution.

The first one is the recommended method.

like image 32
Alvaro Agea Avatar answered Oct 15 '22 17:10

Alvaro Agea


This can also happen if you do not specify the full package name when using spark-submit command line. If your main method for the application is in test.spark.SimpleApp then the command line needs to look something like this:

./bin/spark-submit --class "test.spark.SimpleApp" --master local[2] /path_to_project/target/spark_testing-1.0-SNAPSHOT.jar

Adding just --class "SimpleApp" will fail with ClassNotFoundException.

like image 1
DavidR Avatar answered Oct 15 '22 18:10

DavidR