I'm trying to use Spark 2.3.1 with Java.
I followed examples in the documentation but keep getting poorly described exception when calling .fit(trainingData)
.
Exception in thread "main" java.lang.IllegalArgumentException
at org.apache.xbean.asm5.ClassReader.<init>(Unknown Source)
at org.apache.xbean.asm5.ClassReader.<init>(Unknown Source)
at org.apache.xbean.asm5.ClassReader.<init>(Unknown Source)
at org.apache.spark.util.ClosureCleaner$.getClassReader(ClosureCleaner.scala:46)
at org.apache.spark.util.FieldAccessFinder$$anon$3$$anonfun$visitMethodInsn$2.apply(ClosureCleaner.scala:449)
at org.apache.spark.util.FieldAccessFinder$$anon$3$$anonfun$visitMethodInsn$2.apply(ClosureCleaner.scala:432)
at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:733)
at scala.collection.mutable.HashMap$$anon$1$$anonfun$foreach$2.apply(HashMap.scala:103)
at scala.collection.mutable.HashMap$$anon$1$$anonfun$foreach$2.apply(HashMap.scala:103)
at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230)
at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
at scala.collection.mutable.HashMap$$anon$1.foreach(HashMap.scala:103)
at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:732)
at org.apache.spark.util.FieldAccessFinder$$anon$3.visitMethodInsn(ClosureCleaner.scala:432)
at org.apache.xbean.asm5.ClassReader.a(Unknown Source)
at org.apache.xbean.asm5.ClassReader.b(Unknown Source)
at org.apache.xbean.asm5.ClassReader.accept(Unknown Source)
at org.apache.xbean.asm5.ClassReader.accept(Unknown Source)
at org.apache.spark.util.ClosureCleaner$$anonfun$org$apache$spark$util$ClosureCleaner$$clean$14.apply(ClosureCleaner.scala:262)
at org.apache.spark.util.ClosureCleaner$$anonfun$org$apache$spark$util$ClosureCleaner$$clean$14.apply(ClosureCleaner.scala:261)
at scala.collection.immutable.List.foreach(List.scala:381)
at org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:261)
at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:159)
at org.apache.spark.SparkContext.clean(SparkContext.scala:2299)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2073)
at org.apache.spark.rdd.RDD$$anonfun$take$1.apply(RDD.scala:1358)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)
at org.apache.spark.rdd.RDD.take(RDD.scala:1331)
at org.apache.spark.ml.tree.impl.DecisionTreeMetadata$.buildMetadata(DecisionTreeMetadata.scala:112)
at org.apache.spark.ml.tree.impl.RandomForest$.run(RandomForest.scala:105)
at org.apache.spark.ml.classification.DecisionTreeClassifier.train(DecisionTreeClassifier.scala:116)
at org.apache.spark.ml.classification.DecisionTreeClassifier.train(DecisionTreeClassifier.scala:45)
at org.apache.spark.ml.Predictor.fit(Predictor.scala:118)
at com.example.spark.MyApp.main(MyApp.java:36)
I took this dummy dataset for classification (data.csv
):
f,label
1,1
1.5,1
0,0
2,2
2.5,2
My code:
SparkSession spark = SparkSession.builder()
.master("local[1]")
.appName("My App")
.getOrCreate();
Dataset<Row> data = spark.read().format("csv")
.option("header", "true")
.option("inferSchema", "true")
.load("C:\\tmp\\data.csv");
data.show(); // see output(1) below
VectorAssembler assembler = new VectorAssembler()
.setInputCols(Collections.singletonList("f").toArray(new String[0]))
.setOutputCol("features");
Dataset<Row> trainingData = assembler.transform(data)
.select("features", "label");
trainingData.show(); // see output(2) below
DecisionTreeClassifier clf = new DecisionTreeClassifier();
DecisionTreeClassificationModel model = clf.fit(trainingData); // fails here (MyApp.java:36)
Dataset<Row> predictions = model.transform(trainingData);
predictions.show(); // never reached
Output(1):
+---+-----+
| f|label|
+---+-----+
|1.0| 1|
|1.5| 1|
|0.0| 0|
|2.0| 2|
|2.5| 2|
+---+-----+
Output(2):
+--------+-----+
|features|label|
+--------+-----+
| [1.0]| 1|
| [1.5]| 1|
| [0.0]| 0|
| [2.0]| 2|
| [2.5]| 2|
+--------+-----+
My build.gradle
file looks like this:
plugins {
id 'java'
id 'application'
}
group 'com.example'
version '1.0-SNAPSHOT'
sourceCompatibility = 1.8
mainClassName = 'MyApp'
repositories {
mavenCentral()
}
dependencies {
compile group: 'org.apache.spark', name: 'spark-core_2.11', version: '2.3.1'
compile group: 'org.apache.spark', name: 'spark-sql_2.11', version: '2.3.1'
compile group: 'org.apache.spark', name: 'spark-mllib_2.11', version: '2.3.1'
}
What am I missing?
What Java version do you have downloaded on your machine? Your problem is probably related to Java 9.
If you download Java 8 (jdk-8u171, for instance), the Exception will disappear, and output(3) of predictions.show()
will look like this:
+--------+-----+-------------+-------------+----------+
|features|label|rawPrediction| probability|prediction|
+--------+-----+-------------+-------------+----------+
| [1.0]| 1|[0.0,2.0,0.0]|[0.0,1.0,0.0]| 1.0|
| [1.5]| 1|[0.0,2.0,0.0]|[0.0,1.0,0.0]| 1.0|
| [0.0]| 0|[1.0,0.0,0.0]|[1.0,0.0,0.0]| 0.0|
| [2.0]| 2|[0.0,0.0,2.0]|[0.0,0.0,1.0]| 2.0|
| [2.5]| 2|[0.0,0.0,2.0]|[0.0,0.0,1.0]| 2.0|
+--------+-----+-------------+-------------+----------+
I had the same ploblem, my system use Spark 2.2.0 with Java 8, now We would like the upgrade the server, but the Spark 2.3.1 doesn't work with Java 10 yet, in my case I continue working with Java 8 in the Spark Server and upgraded only Spark to 2.3.1
I read same post about the suubject:
https://issues.apache.org/jira/browse/SPARK-24421
Why apache spark does not work with java 10?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With