Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Apache Spark: Classloader cannot find classDef in the jar

I'm running a job in Apache Spark in local mode that will save its result to s3a file system. Since Hadoop 2.6 doesn't have s3a:// implementation (or s3://, s3n://). I package an uber jar that includes all transitive dependency of hadoop-aws 2.6.0, and submit it with the jar of my main job.

However, when I test it with the following minimalistic code:

sc.parallelize(1 to 100).saveAsTextFile("s3a://***/test10/")

The compiler gave me this error in my first run:

java.lang.NoClassDefFoundError: org/apache/commons/logging/LogFactory
    at com.amazonaws.auth.AWSCredentialsProviderChain.<clinit>(AWSCredentialsProviderChain.java:41)
    at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:112)
    at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2596)
    at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91)
    at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2630)
    at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2612)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370)
    at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296)
    at org.apache.spark.SparkHadoopWriter$.createPathFromString(SparkHadoopWriter.scala:170)
    at org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopFile(PairRDDFunctions.scala:953)
    at org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopFile(PairRDDFunctions.scala:863)
    at org.apache.spark.rdd.RDD.saveAsTextFile(RDD.scala:1290)

And if I try to run again, it gave me this error:

java.lang.NoClassDefFoundError: Could not initialize class com.amazonaws.auth.AWSCredentialsProviderChain
    at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:112)
    at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2596)
    at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91)
    at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2630)
    at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2612)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370)
    at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296)
    at org.apache.spark.SparkHadoopWriter$.createPathFromString(SparkHadoopWriter.scala:170)
    at org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopFile(PairRDDFunctions.scala:953)
    at org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopFile(PairRDDFunctions.scala:863)
    at org.apache.spark.rdd.RDD.saveAsTextFile(RDD.scala:1290)

The strange thing is: Both LogFactory & AWSCredentialsProviderChain are in the uber jar I mentioned. I've also checked other jars, including the spark library on workers and jar of the my main job (already deployed to spark/worker directory), and could confirm that none of them has class with an identical name. So it cannot be a jar hell issue (Besides, in that case the thrown error should be a Property/MethodNotFoundError). Do you have any clue what may have happened and how to fix it?

like image 481
tribbloid Avatar asked Nov 09 '22 12:11

tribbloid


1 Answers

I met a similar problem before and my solution was to add the uber jar itself to --driver-class-path while you run spark-submit. Your uber-jar isn't directly executed by JVM. Instead it is run by Spark's some sort of driver wrapper. Add the uber jar to the driver's classpath seems unnecessary but can sometimes solve some strange NoClassDefFoundError. I'm not sure if it can solve your problem but it's worth a try.

like image 168
Wesley Miao Avatar answered Nov 14 '22 22:11

Wesley Miao