I'm using MLlib of Apache-Spark and Scala. I need to convert a group of Vector
import org.apache.spark.mllib.linalg.{Vector, Vectors}
import org.apache.spark.mllib.regression.LabeledPoint
in a LabeledPoint in order to apply the algorithms of MLLib
Each vector is composed of Double value of 0.0 (false) or 1.0 (true).
All the vectors are saved in a RDD, so the final RDD is of the type
val data_tmp: org.apache.spark.rdd.RDD[org.apache.spark.mllib.linalg.Vector]
So, in the RDD there are vectors create with
def createArray(values: List[String]) : Vector =
{
var arr : Array[Double] = new Array[Double](tags_table.size)
tags_table.foreach(x => arr(x._2) = if (values.contains(x._1)) 1.0 else 0.0 )
val dv: Vector = Vectors.dense(arr)
return dv
}
/*each element of result is a List[String]*/
val data_tmp=result.map(x=> createArray(x._2))
val data: RowMatrix = new RowMatrix(data_tmp)
How I can create from this RDD (data_tmp) or from the RowMatrix (data) a LabeledPoint set for using the MLLib algorithms? For example i need to apply the SVMs linear alghoritms show here
I found the solution:
def createArray(values: List[String]) : Vector =
{
var arr : Array[Double] = new Array[Double](tags_table.size)
tags_table.foreach(x => arr(x._2) = if (values.contains(x._1)) 1.0 else 0.0 )
val dv: Vector = Vectors.dense(arr)
return dv
}
val data_tmp=result.map(x=> createArray(x._2))
val parsedData = data_tmp.map { line => LabeledPoint(1.0,line) }
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With