Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Scala vector scalar multiplication

I'm implementing some machine learning algorithm in Apache Spark MLlib and I would like to multiply vector with scalar:

enter image description here

Where u_i_j_m is a Double and x_i is a vector

I've tried the following:

import breeze.linalg.{ DenseVector => BDV, Vector => BV}
import org.apache.spark.mllib.linalg.{DenseVector, Vectors, Vector}
...

private def runAlgorithm(data: RDD[VectorWithNorm]): = {
    ...
    data.mapPartitions { data_ponts =>
        c = Array.fill(clustersNum)(BDV.zeros[Double](dim).asInstanceOf[BV[Double]])
        ...
        data_ponts.foreach { data_point =>
            ...
            u_i_j_m : Double = ....
            val temp= data_point.vector * u_i_j_m)
            // c(j) = temp
        }
    }
}

Where VectorWithNorm is defined as following:

class VectorWithNorm(val vector: Vector, val norm: Double) extends Serializable {

    def this(vector: Vector) = this(vector, Vectors.norm(vector, 2.0))
    def this(array: Array[Double]) = this(Vectors.dense(array))
    def toDense: VectorWithNorm = new  VectorWithNorm(Vectors.dense(vector.toArray), norm)
}

But when I build the project I get the following error:

Error: value * is not a member of org.apache.spark.mllib.linalg.Vector val temp = (data_point.vector * u_i_j_m)

How can I do this multiplication correctly?

like image 490
Alex L Avatar asked Jan 07 '23 00:01

Alex L


2 Answers

Unfortunately the Spark-Scala contributors decided that they will not pick a library for underlying computations i.e. linear algebra, in Scala. Under the hood they use breeze, but scalar * and + on Spark Vector's are private, as well as other useful methods. This is quite different than python where you can use excellent numpy linear algebra library. The argument was that developers are stretched thin, that breeze was suspicious because development stalled (if I remember correctly), there was an alternative (apache.commons.math), so they decided to let the users pick which linalg library you want to use in Scala. But, prompted by some members of the community, there is now a spark-package which provides linear algebra on org.apache.spark.mllib.linalg.Vector - see here.

like image 189
KrisP Avatar answered Jan 10 '23 20:01

KrisP


In your code you are using sparks's Vector trait instead of breeze's DenseVector, that's why there is no * operator defined on your data_point.vector member.

like image 30
kosii Avatar answered Jan 10 '23 18:01

kosii