The result of correlation in Spark MLLib is a of type org.apache.spark.mllib.linalg.Matrix. (see http://spark.apache.org/docs/1.2.1/mllib-statistics.html#correlations)
val data: RDD[Vector] = ...
val correlMatrix: Matrix = Statistics.corr(data, "pearson")
I would like to save the result into a file. How can I do this?
Here is a simple and effective approach to save the Matrix to hdfs and specify the separator.
(The transpose is used since .toArray is in column major format.)
val localMatrix: List[Array[Double]] = correlMatrix
.transpose // Transpose since .toArray is column major
.toArray
.grouped(correlMatrix.numCols)
.toList
val lines: List[String] = localMatrix
.map(line => line.mkString(" "))
sc.parallelize(lines)
.repartition(1)
.saveAsTextFile("hdfs:///home/user/spark/correlMatrix.txt")
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With