How to convert from org.apache.spark.mllib.linalg.Matrix
to RDD[org.apache.spark.mllib.linalg.Vector]
in Spark?
The matrix is generated from SVD, and I am using the results from SVD to do clustering analysis.
MLlib's Matrix
is a small local matrix. It would probably be more efficient to analyze it locally instead of turning it into an RDD.
Anyway, if your clustering only supports RDD as its input, here's how you can do the transformation:
import org.apache.spark.mllib.linalg._
def toRDD(m: Matrix): RDD[Vector] = {
val columns = m.toArray.grouped(m.numRows)
val rows = columns.toSeq.transpose // Skip this if you want a column-major RDD.
val vectors = rows.map(row => new DenseVector(row.toArray))
sc.parallelize(vectors)
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With