I have what feels like a simple problem, but I can't seem to find an answer. I'm pretty new to Weka, but I feel like I've done a bit of research on this (at least read through the first couple of pages of Google results) and come up dry.
I am using Weka to run clustering using Simple K-Means. In the results list I have no problem visualizing my output ("Visualize cluster assignments") and it is clear both from my understanding of the K-Means algorithm and the output of Weka that each of my instances is ending up as a member of a different cluster (centered around a particular centroid, if you will).
I can see something of the cluster composition from the text output. However Weka provides me with no explicit "mapping" from instance number to cluster number. I would like something like:
instance 1 --> cluster 0
instance 2 --> cluster 0
instance 3 --> cluster 2
instance 4 --> cluster 1
... etc.
How do I obtain these results without calculating the distance from each item to each centroid on my own?
The WEKA SimpleKMeans algorithm uses Euclidean distance measure to compute distances between instances and clusters. To perform clustering, select the "Cluster" tab in the Explorer and click on the "Choose" button. This results in a drop down list of available clustering algorithms.
K-means assigns every data point in the dataset to the nearest centroid, meaning that a data point is considered to be in a particular cluster if it is closer to that cluster's centroid than any other centroid.
Step 1: In the preprocessing interface, open the Weka Explorer and load the required dataset, and we are taking the iris. arff dataset. Step 2: Find the 'cluster' tab in the explorer and press the choose button to execute clustering.
I had the same problem and figured it out. I am posting the method here if anyone needs to know :
Its actually quite simple, you have to use Weka's java api.
SimpleKMeans kmeans = new SimpleKMeans();
kmeans.setSeed(10);
// This is the important parameter to set
kmeans.setPreserveInstancesOrder(true);
kmeans.setNumClusters(numberOfClusters);
kmeans.buildClusterer(instances);
// This array returns the cluster number (starting with 0) for each instance
// The array has as many elements as the number of instances
int[] assignments = kmeans.getAssignments();
int i=0;
for(int clusterNum : assignments) {
System.out.printf("Instance %d -> Cluster %d", i, clusterNum);
i++;
}
Aha, I think I found what I was looking for. Under the cluster visualizer, click "Save". This saves the whole data set as an ARFF file almost identical to the input file I provided, but with 2 new attributes: the first attribute is the index of the instance, while the last attribute is the cluster assignment. Now I just have to parse the crap out of it!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With