I'm trying to implement neural networks in spark and scala but unable to perform any vector or matrix multiplication. Spark provide two vectors. Spark.util vector support dot operation but it is deprecated. mllib.linalg vectors do not support operations in scala.
Which one to use to store weights and training data?
How to perform vector multiplication in spark scala with mllib like w*x where w is vector or matrix of weights and x is input. pyspark vector support dot product but in scala I'm not able to find such function in vectors
Choosing Between Spark MLlib and Spark ML At first glance, the most obvious difference between MLlib and ML is the data types they work on, with MLlib supporting RDDs and ML supporting DataFrame s and Dataset s.
Apache Spark is known as a fast, easy-to-use and general engine for big data processing that has built-in modules for streaming, SQL, Machine Learning (ML) and graph processing.
Spark MLlib is used to perform machine learning in Apache Spark. MLlib consists of popular algorithms and utilities. MLlib in Spark is a scalable Machine learning library that discusses both high-quality algorithm and high speed.
Well, if you need a full support for linear algebra operators you have to implement these by yourself or use an external library. In the second case the obvious choice is Breeze.
It is already used behind the scenes so doesn't introduce additional dependencies and you can easily modify existing Spark code for conversions:
import breeze.linalg.{DenseVector => BDV, SparseVector => BSV, Vector => BV}
def toBreeze(v: Vector): BV[Double] = v match {
case DenseVector(values) => new BDV[Double](values)
case SparseVector(size, indices, values) => {
new BSV[Double](indices, values, size)
}
}
def toSpark(v: BV[Double]) = v match {
case v: BDV[Double] => new DenseVector(v.toArray)
case v: BSV[Double] => new SparseVector(v.length, v.index, v.data)
}
Mahout provides interesting Spark and Scala bindings you may find interesting as well.
For simple matrix vector multiplications it can be easier to leverage existing matrix methods. For example IndexedRowMatrix
and RowMatrix
provide multiply
methods which can take a local matrix. You can check Matrix Multiplication in Apache Spark for an example usage.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With