I'm using spark 0.91
with MLlib 0.91
on DSE
When trying to run the following code on standalone mode
val parsedData = sc.parallelize((1 to 1000).
map {
line =>
LabeledPoint(0.0, Array(0.0, 0.4, 0.3))
})
val numIterations = 2
val model = LinearRegressionWithSGD.train(parsedData, numIterations)
I'm getting this error:
14/09/20 14:28:37 ERROR OneForOneStrategy: org.jblas.DoubleMatrix cannot be cast to org.jblas.DoubleMatrix
java.lang.ClassCastException: org.jblas.DoubleMatrix cannot be cast to org.jblas.DoubleMatrix
at org.apache.spark.mllib.optimization.GradientDescent$$anonfun$runMiniBatchSGD$1$$anonfun$2.apply(GradientDescent.scala:150)
at org.apache.spark.mllib.optimization.GradientDescent$$anonfun$runMiniBatchSGD$1$$anonfun$2.apply(GradientDescent.scala:150)
at org.apache.spark.rdd.RDD$$anonfun$6.apply(RDD.scala:677)
at org.apache.spark.rdd.RDD$$anonfun$6.apply(RDD.scala:674)
at org.apache.spark.scheduler.JobWaiter.taskSucceeded(JobWaiter.scala:56)
at org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:846)
at org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:601)
This happens only when trying to run a standalone application. It works on spark shell (dse spark). Any ideas?
Updates:
When I create an object on REPL the getClassLoader returns:
scala> new org.jblas.DoubleMatrix().getClass().getClassLoader()
res3: ClassLoader = ModuleClassLoader:Analytics
But when I run in standalone mode (with spark-class) it returns
new org.jblas.DoubleMatrix().getClass().getClassLoader():
class= SystemClassLoader
Maybe this is a hint.
I use SBT to generate the jar and submit it with spark-class. Here is the configuration
name := "analytics"
version := "1.0"
scalaVersion := "2.10.3"
unmanagedJars in Compile ++=
Attributed.blankSeq((file("./dse/lib/") * "*.jar").get)
unmanagedJars in Compile ++=
Attributed.blankSeq((file("./dse/resources/spark/lib/") * "*.jar").get)
unmanagedJars in Compile ++=
Attributed.blankSeq((file("./dse/resources/cassandra/lib/") * "*.jar").get)
unmanagedJars in Runtime ++=
Attributed.blankSeq((file("./dse/resources/hadoop/") * "*.jar").get)
unmanagedJars in Runtime ++=
Attributed.blankSeq((file("./dse/resources/hadoop/lib/") * "*.jar").get)
unmanagedJars in Compile ++=
Attributed.blankSeq((file("./dse/resources/driver/lib/") * "*.jar").get)
Update 2: Used the configuration of the dse demos to build and deploy with ant but again i face the same error
This indeed seems to be a classloading problem. In particular, I believe you're hitting this bug, fixed in 1.0.
You can't cast an object of a class loaded by one class-loader in another class loader.
There's a small chance you may find a solution by changing the context class loader manually. It requires that you can actually get a reference to an appropriate class loader, which may or may not be possible in your case. Something like:
Thread.currentThread().setContextClassloader(...)
But since I know nothing about DSE, I'll have to refer you to this article: http://www.datastax.com/dev/blog/classloading-in-dse-analytics
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With