I have read somewhere that MLlib local vectors/matrices are currently wrapping Breeze implementation, but the methods converting MLlib to Breeze vectors/matrices are private to org.apache.spark.mllib scope. The suggestion to work around this is to write your code in org.apache.spark.mllib.something package.
Is there a better way to do this? Can you cite some relevant examples?
Thanks and regards,
An IndexedRowMatrix is similar to a RowMatrix but with meaningful row indices. It is backed by an RDD of indexed rows, so that each row is represented by its index (long-typed) and a local vector. An IndexedRowMatrix can be created from an RDD[IndexedRow] instance, where IndexedRow is a wrapper over (Long, Vector) .
Distributed matrix. A distributed matrix has long-typed row and column indices and double-typed values, stored distributively in one or more RDDs. It is very important to choose the right format to store large and distributed matrices.
I did the same solution as @dlwh suggested. Here is the code that does it:
package org.apache.spark.mllib.linalg
object VectorPub {
implicit class VectorPublications(val vector : Vector) extends AnyVal {
def toBreeze : breeze.linalg.Vector[scala.Double] = vector.toBreeze
}
implicit class BreezeVectorPublications(val breezeVector : breeze.linalg.Vector[Double]) extends AnyVal {
def fromBreeze : Vector = Vectors.fromBreeze(breezeVector)
}
}
notice that the implicit class extends AnyVal to prevent allocation of a new object when calling those methods
My solution is kind of a hybrid of those of @barclar and @lev, above. You don't need to put your code in the org.apache.spark.mllib.linalg
if you don't make use of the spark-ml implicit conversions. You can define your own implicit conversions in your own package, like:
package your.package
import org.apache.spark.ml.linalg.DenseVector
import org.apache.spark.ml.linalg.SparseVector
import org.apache.spark.ml.linalg.Vector
import breeze.linalg.{DenseVector => BDV, SparseVector => BSV, Vector => BV}
object BreezeConverters
{
implicit def toBreeze( dv: DenseVector ): BDV[Double] =
new BDV[Double](dv.values)
implicit def toBreeze( sv: SparseVector ): BSV[Double] =
new BSV[Double](sv.indices, sv.values, sv.size)
implicit def toBreeze( v: Vector ): BV[Double] =
v match {
case dv: DenseVector => toBreeze(dv)
case sv: SparseVector => toBreeze(sv)
}
implicit def fromBreeze( dv: BDV[Double] ): DenseVector =
new DenseVector(dv.toArray)
implicit def fromBreeze( sv: BSV[Double] ): SparseVector =
new SparseVector(sv.length, sv.index, sv.data)
implicit def fromBreeze( bv: BV[Double] ): Vector =
bv match {
case dv: BDV[Double] => fromBreeze(dv)
case sv: BSV[Double] => fromBreeze(sv)
}
}
Then you can import these implicits into your code with:
import your.package.BreezeConverters._
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With