When I run the code
val home = "/Users/adremja/Documents/Kaggle/outbrain"
val documents_categories = sc.textFile(home + "/documents_categories.csv")
documents_categories take(10) foreach println
in spark-shell it works perfectly
scala> val home = "/Users/adremja/Documents/Kaggle/outbrain"
home: String = /Users/adremja/Documents/Kaggle/outbrain
scala> val documents_categories = sc.textFile(home + "/documents_categories.csv")
documents_categories: org.apache.spark.rdd.RDD[String] = /Users/adremja/Documents/Kaggle/outbrain/documents_categories.csv MapPartitionsRDD[21] at textFile at <console>:26
scala> documents_categories take(10) foreach println
document_id,category_id,confidence_level
1595802,1611,0.92
1595802,1610,0.07
1524246,1807,0.92
1524246,1608,0.07
1617787,1807,0.92
1617787,1608,0.07
1615583,1305,0.92
1615583,1806,0.07
1615460,1613,0.540646372
However when I try to run in the Zeppelin I get an error
java.lang.NoClassDefFoundError: Could not initialize class org.apache.spark.rdd.RDDOperationScope$
at org.apache.spark.SparkContext.withScope(SparkContext.scala:679)
at org.apache.spark.SparkContext.textFile(SparkContext.scala:797)
... 46 elided
Do you have any idea where is the problem?
I have spark 2.0.1 from homebrew (I linked it in zeppelin-env.sh as SPARK_HOME) and Zeppelin 0.6.2 binary from Zeppelin's website.
OK it looks like I found solution. From lib folder in zeppelin I deleted:
and replaced it with version 2.6.5, which spark uses.
It works right now, but I don't know if I didn't spoil anything else.
it seems that the spark version's problem ,the zeppelin 0.6.2 support spark 1.6 while we run spark with version 2.0. So the jars may not compatible.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With