I'm implementing some machine learning algorithm in Apache Spark MLlib and I would like to multiply vector with scalar:
Where u_i_j_m is a Double and x_i is a vector
I've tried the following:
import breeze.linalg.{ DenseVector => BDV, Vector => BV}
import org.apache.spark.mllib.linalg.{DenseVector, Vectors, Vector}
...
private def runAlgorithm(data: RDD[VectorWithNorm]): = {
...
data.mapPartitions { data_ponts =>
c = Array.fill(clustersNum)(BDV.zeros[Double](dim).asInstanceOf[BV[Double]])
...
data_ponts.foreach { data_point =>
...
u_i_j_m : Double = ....
val temp= data_point.vector * u_i_j_m)
// c(j) = temp
}
}
}
Where VectorWithNorm is defined as following:
class VectorWithNorm(val vector: Vector, val norm: Double) extends Serializable {
def this(vector: Vector) = this(vector, Vectors.norm(vector, 2.0))
def this(array: Array[Double]) = this(Vectors.dense(array))
def toDense: VectorWithNorm = new VectorWithNorm(Vectors.dense(vector.toArray), norm)
}
But when I build the project I get the following error:
Error: value * is not a member of org.apache.spark.mllib.linalg.Vector val temp = (data_point.vector * u_i_j_m)
How can I do this multiplication correctly?
Unfortunately the Spark-Scala contributors decided that they will not pick a library for underlying computations i.e. linear algebra, in Scala. Under the hood they use breeze, but scalar * and + on Spark Vector's are private, as well as other useful methods. This is quite different than python where you can use excellent numpy linear algebra library. The argument was that developers are stretched thin, that breeze was suspicious because development stalled (if I remember correctly), there was an alternative (apache.commons.math), so they decided to let the users pick which linalg library you want to use in Scala. But, prompted by some members of the community, there is now a spark-package which provides linear algebra on org.apache.spark.mllib.linalg.Vector
- see here.
In your code you are using sparks's Vector trait instead of breeze's DenseVector, that's why there is no *
operator defined on your data_point.vector
member.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With