Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Spark MLlib 0.91 org.jblas.DoubleMatrix errors

I'm using spark 0.91 with MLlib 0.91 on DSE

When trying to run the following code on standalone mode

val parsedData = sc.parallelize((1 to 1000).
  map {
  line =>
    LabeledPoint(0.0, Array(0.0, 0.4, 0.3))
})
val numIterations = 2
val model = LinearRegressionWithSGD.train(parsedData, numIterations)

I'm getting this error:

    14/09/20 14:28:37 ERROR OneForOneStrategy: org.jblas.DoubleMatrix cannot be cast to org.jblas.DoubleMatrix
java.lang.ClassCastException: org.jblas.DoubleMatrix cannot be cast to org.jblas.DoubleMatrix
        at org.apache.spark.mllib.optimization.GradientDescent$$anonfun$runMiniBatchSGD$1$$anonfun$2.apply(GradientDescent.scala:150)
        at org.apache.spark.mllib.optimization.GradientDescent$$anonfun$runMiniBatchSGD$1$$anonfun$2.apply(GradientDescent.scala:150)
        at org.apache.spark.rdd.RDD$$anonfun$6.apply(RDD.scala:677)
        at org.apache.spark.rdd.RDD$$anonfun$6.apply(RDD.scala:674)
        at org.apache.spark.scheduler.JobWaiter.taskSucceeded(JobWaiter.scala:56)
        at org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:846)
        at org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:601)

This happens only when trying to run a standalone application. It works on spark shell (dse spark). Any ideas?

Updates:

When I create an object on REPL the getClassLoader returns:

scala>  new org.jblas.DoubleMatrix().getClass().getClassLoader()
res3: ClassLoader = ModuleClassLoader:Analytics

But when I run in standalone mode (with spark-class) it returns

new org.jblas.DoubleMatrix().getClass().getClassLoader():
class= SystemClassLoader

Maybe this is a hint.

I use SBT to generate the jar and submit it with spark-class. Here is the configuration

name := "analytics"

version := "1.0"

scalaVersion := "2.10.3"

unmanagedJars in Compile ++=
  Attributed.blankSeq((file("./dse/lib/") * "*.jar").get)

unmanagedJars in Compile ++=
  Attributed.blankSeq((file("./dse/resources/spark/lib/") * "*.jar").get)

unmanagedJars in Compile ++=
  Attributed.blankSeq((file("./dse/resources/cassandra/lib/") * "*.jar").get)

unmanagedJars in Runtime ++=
  Attributed.blankSeq((file("./dse/resources/hadoop/") * "*.jar").get)

unmanagedJars in Runtime ++=
  Attributed.blankSeq((file("./dse/resources/hadoop/lib/") * "*.jar").get)

unmanagedJars in Compile ++=
  Attributed.blankSeq((file("./dse/resources/driver/lib/") * "*.jar").get)

Update 2: Used the configuration of the dse demos to build and deploy with ant but again i face the same error

like image 337
weakwire Avatar asked Nov 10 '22 01:11

weakwire


1 Answers

This indeed seems to be a classloading problem. In particular, I believe you're hitting this bug, fixed in 1.0.

You can't cast an object of a class loaded by one class-loader in another class loader.

There's a small chance you may find a solution by changing the context class loader manually. It requires that you can actually get a reference to an appropriate class loader, which may or may not be possible in your case. Something like:

Thread.currentThread().setContextClassloader(...)

But since I know nothing about DSE, I'll have to refer you to this article: http://www.datastax.com/dev/blog/classloading-in-dse-analytics

like image 198
Francois G Avatar answered Nov 14 '22 23:11

Francois G