Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Predict clusters from data using Spark MLlib KMeans

I have generated my cluster centers from features of my data say 'Kmeans.data.txt' as you find in

https://github.com/apache/spark/blob/master/data/mllib/kmeans_data.txt

This was performed using KMeans in Spark MLib.

clusters.clusterCenters.foreach(println)

Any idea how to predict the clusters derived from this data?

like image 519
Taiwotman Avatar asked Mar 22 '16 02:03

Taiwotman


2 Answers

Excerpt from the KMean MLlib clustering code snippet retrieved from Scala Spark

import org.apache.spark.mllib.clustering.KMeans
import org.apache.spark.mllib.linalg.Vectors

// Load and parse the data
val data = sc.textFile("data/mllib/kmeans_data.txt")
val parsedData = data.map(s => Vectors.dense(s.split(' ').map(_.toDouble)))

// Cluster the data into two classes using KMeans
val numClusters = 2
val numIterations = 20
val clusters = KMeans.train(parsedData, numClusters, numIterations)

// here is what I added to predict data points that are within the clusters
clusters.predict(parsedData).foreach(println)
like image 186
Taiwotman Avatar answered Sep 19 '22 19:09

Taiwotman


It's pretty simple, if you read the KmeansModel's documentation, you will notice that it has two constructors, one of them:

new KMeansModel(clusterCenters: Array[Vector])

Therefore, you can instantiate an object having KMeans' centroids. I show an example below.

import org.apache.spark.mllib.clustering.KMeansModel
import org.apache.spark.mllib.linalg.Vectors

val rdd = sc.parallelize(List(
  Vectors.dense(Array(-0.1, 0.0, 0.0)), 
  Vectors.dense(Array(9.0, 9.0, 9.0)), 
  Vectors.dense(Array(3.0, 2.0, 1.0))))

val centroids = Array(
  Vectors.dense(Array(0.0, 0.0, 0.0)), 
  Vectors.dense(Array(0.1, 0.1, 0.1)),
  Vectors.dense(Array(0.2, 0.2, 0.2)),
  Vectors.dense(Array(9.0, 9.0, 9.0)),
  Vectors.dense(Array(9.1, 9.1, 9.1)),
  Vectors.dense(Array(9.2, 9.2, 9.2)))

val model = new KMeansModel(clusterCenters=centroids)

model.predict(rdd).take(10)

// res13: Array[Int] = Array(0, 3, 2)
like image 41
Alberto Bonsanto Avatar answered Sep 19 '22 19:09

Alberto Bonsanto