Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to convert matrix to RDD[Vector] in spark

How to convert from org.apache.spark.mllib.linalg.Matrix to RDD[org.apache.spark.mllib.linalg.Vector] in Spark?

The matrix is generated from SVD, and I am using the results from SVD to do clustering analysis.

like image 950
Jiang Xiang Avatar asked Jan 26 '15 20:01

Jiang Xiang


1 Answers

MLlib's Matrix is a small local matrix. It would probably be more efficient to analyze it locally instead of turning it into an RDD.

Anyway, if your clustering only supports RDD as its input, here's how you can do the transformation:

import org.apache.spark.mllib.linalg._
def toRDD(m: Matrix): RDD[Vector] = {
  val columns = m.toArray.grouped(m.numRows)
  val rows = columns.toSeq.transpose // Skip this if you want a column-major RDD.
  val vectors = rows.map(row => new DenseVector(row.toArray))
  sc.parallelize(vectors)
}
like image 158
Daniel Darabos Avatar answered Oct 01 '22 02:10

Daniel Darabos