Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Weka simple K-means clustering assignments

I have what feels like a simple problem, but I can't seem to find an answer. I'm pretty new to Weka, but I feel like I've done a bit of research on this (at least read through the first couple of pages of Google results) and come up dry.

I am using Weka to run clustering using Simple K-Means. In the results list I have no problem visualizing my output ("Visualize cluster assignments") and it is clear both from my understanding of the K-Means algorithm and the output of Weka that each of my instances is ending up as a member of a different cluster (centered around a particular centroid, if you will).

I can see something of the cluster composition from the text output. However Weka provides me with no explicit "mapping" from instance number to cluster number. I would like something like:

instance 1 --> cluster 0
instance 2 --> cluster 0
instance 3 --> cluster 2
instance 4 --> cluster 1
... etc.

How do I obtain these results without calculating the distance from each item to each centroid on my own?

like image 748
machine yearning Avatar asked Jul 13 '11 21:07

machine yearning


People also ask

How do we implement k-means clustering in WEKA?

The WEKA SimpleKMeans algorithm uses Euclidean distance measure to compute distances between instances and clusters. To perform clustering, select the "Cluster" tab in the Explorer and click on the "Choose" button. This results in a drop down list of available clustering algorithms.

How does K means assign cluster?

K-means assigns every data point in the dataset to the nearest centroid, meaning that a data point is considered to be in a particular cluster if it is closer to that cluster's centroid than any other centroid.

How do you cluster in WEKA?

Step 1: In the preprocessing interface, open the Weka Explorer and load the required dataset, and we are taking the iris. arff dataset. Step 2: Find the 'cluster' tab in the explorer and press the choose button to execute clustering.


2 Answers

I had the same problem and figured it out. I am posting the method here if anyone needs to know :

Its actually quite simple, you have to use Weka's java api.

SimpleKMeans kmeans = new SimpleKMeans();

kmeans.setSeed(10);

// This is the important parameter to set
kmeans.setPreserveInstancesOrder(true);
kmeans.setNumClusters(numberOfClusters);
kmeans.buildClusterer(instances);

// This array returns the cluster number (starting with 0) for each instance
// The array has as many elements as the number of instances
int[] assignments = kmeans.getAssignments();

int i=0;
for(int clusterNum : assignments) {
    System.out.printf("Instance %d -> Cluster %d", i, clusterNum);
    i++;
}
like image 176
amon.gammon Avatar answered Oct 31 '22 12:10

amon.gammon


Aha, I think I found what I was looking for. Under the cluster visualizer, click "Save". This saves the whole data set as an ARFF file almost identical to the input file I provided, but with 2 new attributes: the first attribute is the index of the instance, while the last attribute is the cluster assignment. Now I just have to parse the crap out of it!

like image 22
machine yearning Avatar answered Oct 31 '22 14:10

machine yearning