MLlib to Breeze vectors/matrices are private to org.apache.spark.mllib scope?

Tags:

I have read somewhere that MLlib local vectors/matrices are currently wrapping Breeze implementation, but the methods converting MLlib to Breeze vectors/matrices are private to org.apache.spark.mllib scope. The suggestion to work around this is to write your code in org.apache.spark.mllib.something package.

Is there a better way to do this? Can you cite some relevant examples?

Thanks and regards,

239

asked Oct 30 '14 22:10

learning_spark

2 Answers

I did the same solution as @dlwh suggested. Here is the code that does it:

package org.apache.spark.mllib.linalg

object VectorPub {

  implicit class VectorPublications(val vector : Vector) extends AnyVal {
    def toBreeze : breeze.linalg.Vector[scala.Double] = vector.toBreeze

  }

  implicit class BreezeVectorPublications(val breezeVector : breeze.linalg.Vector[Double]) extends AnyVal {
    def fromBreeze : Vector = Vectors.fromBreeze(breezeVector)
  }
}

notice that the implicit class extends AnyVal to prevent allocation of a new object when calling those methods

answered Oct 05 '22 06:10

lev

My solution is kind of a hybrid of those of @barclar and @lev, above. You don't need to put your code in the org.apache.spark.mllib.linalg if you don't make use of the spark-ml implicit conversions. You can define your own implicit conversions in your own package, like:

package your.package

import org.apache.spark.ml.linalg.DenseVector
import org.apache.spark.ml.linalg.SparseVector
import org.apache.spark.ml.linalg.Vector
import breeze.linalg.{DenseVector => BDV, SparseVector => BSV, Vector => BV}

object BreezeConverters
{
    implicit def toBreeze( dv: DenseVector ): BDV[Double] =
        new BDV[Double](dv.values)

    implicit def toBreeze( sv: SparseVector ): BSV[Double] =
        new BSV[Double](sv.indices, sv.values, sv.size)

    implicit def toBreeze( v: Vector ): BV[Double] =
        v match {
            case dv: DenseVector => toBreeze(dv)
            case sv: SparseVector => toBreeze(sv)
        }

    implicit def fromBreeze( dv: BDV[Double] ): DenseVector =
        new DenseVector(dv.toArray)

    implicit def fromBreeze( sv: BSV[Double] ): SparseVector =
        new SparseVector(sv.length, sv.index, sv.data)

    implicit def fromBreeze( bv: BV[Double] ): Vector =
        bv match {
            case dv: BDV[Double] => fromBreeze(dv)
            case sv: BSV[Double] => fromBreeze(sv)
        }
}

Then you can import these implicits into your code with:

import your.package.BreezeConverters._

answered Oct 05 '22 06:10

corvi42

Related questions
                            
                                VectorAssembler does not support the StringType type scala spark convert
                            
                                How Spark read file with underline the beginning of the file name?
                            
                                Apache Spark RDD Split "|"
                            
                                Getting exception : java.lang.NoSuchMethodError: scala.reflect.api.JavaUniverse.runtimeMirror(Ljava/lang/ClassLoader;) while using data frames
                            
                                Acessing nested columns in pyspark dataframe
                            
                                How to submit multiple Spark applications in parallel without spawning separate JVMs?
                            
                                which is faster in spark, collect() or toLocalIterator()
                            
                                How to set Parquet file encoding in Spark
                            
                                jsontostructs to Row in spark structured streaming
                            
                                How to train a ML model in sparklyr and predict new values on another dataframe?
                            
                                Create new column with an array of range of numbers
                            
                                Spark Dataframe Write to CSV creates _temporary directory file in Standalone Cluster Mode
                            
                                Drop partitions from Spark
                            
                                PySpark: fully cleaning checkpoints
                            
                                Filter array column content
                            
                                Spark Advanced Window with dynamic last
                            
                                How to create an unique autogenerated Id column in a spark dataframe
                            
                                Using Jackson 2.9.9 in java Spark
                            
                                Spark dataframe checkpoint cleanup
                            
                                List (or iterator) of tuples returned by MAP (PySpark)

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

MLlib to Breeze vectors/matrices are private to org.apache.spark.mllib scope?

Tags:

apache-spark

apache-spark-mllib

scala-breeze

learning_spark

People also ask

2 Answers

lev

corvi42

Recent Activity

Donate For Us