Weka simple K-means clustering assignments

Tags:

I have what feels like a simple problem, but I can't seem to find an answer. I'm pretty new to Weka, but I feel like I've done a bit of research on this (at least read through the first couple of pages of Google results) and come up dry.

I am using Weka to run clustering using Simple K-Means. In the results list I have no problem visualizing my output ("Visualize cluster assignments") and it is clear both from my understanding of the K-Means algorithm and the output of Weka that each of my instances is ending up as a member of a different cluster (centered around a particular centroid, if you will).

I can see something of the cluster composition from the text output. However Weka provides me with no explicit "mapping" from instance number to cluster number. I would like something like:

instance 1 --> cluster 0
instance 2 --> cluster 0
instance 3 --> cluster 2
instance 4 --> cluster 1
... etc.

How do I obtain these results without calculating the distance from each item to each centroid on my own?

748

asked Jul 13 '11 21:07

machine yearning

2 Answers

I had the same problem and figured it out. I am posting the method here if anyone needs to know :

Its actually quite simple, you have to use Weka's java api.

SimpleKMeans kmeans = new SimpleKMeans();

kmeans.setSeed(10);

// This is the important parameter to set
kmeans.setPreserveInstancesOrder(true);
kmeans.setNumClusters(numberOfClusters);
kmeans.buildClusterer(instances);

// This array returns the cluster number (starting with 0) for each instance
// The array has as many elements as the number of instances
int[] assignments = kmeans.getAssignments();

int i=0;
for(int clusterNum : assignments) {
    System.out.printf("Instance %d -> Cluster %d", i, clusterNum);
    i++;
}

176

answered Oct 31 '22 12:10

amon.gammon

Aha, I think I found what I was looking for. Under the cluster visualizer, click "Save". This saves the whole data set as an ARFF file almost identical to the input file I provided, but with 2 new attributes: the first attribute is the index of the instance, while the last attribute is the cluster assignment. Now I just have to parse the crap out of it!

answered Oct 31 '22 14:10

machine yearning

Related questions
                            
                                Is a Fuzzy C-Means algorithm available for Python?
                            
                                DBSCAN on spark : which implementation
                            
                                How do I predict new data's cluster after clustering training data?
                            
                                clustering very large dataset in R
                            
                                How do I create a radial cluster like the following code-example in Python?
                            
                                How to create a cluster plot in R?
                            
                                Assign new data point to cluster in kernel k-means (kernlab package in R)?
                            
                                TypeError: ufunc 'true_divide' output (typecode 'd') could not be coerced to provided output parameter (typecode 'q')
                            
                                Newman's modularity clustering for graphs
                            
                                How to find the success rate of a clustering algorithm?
                            
                                Clustering with a distance matrix
                            
                                clustering with NA values in R
                            
                                Where to find a reliable K-medoid(Not k-means) open source software/tool? [closed]
                            
                                Python Clustering Algorithms
                            
                                News clustering
                            
                                Hierarchical Clustering: Determine optimal number of cluster and statistically describe Clusters
                            
                                Graph Theory: Calculating Clustering Coefficient
                            
                                Cosine distance as vector distance function for k-means
                            
                                Extract labels membership / classification from a cut dendrogram in R (i.e.: a cutree function for dendrogram)
                            
                                How to use NLP to separate a unstructured text content into distinct paragraphs?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Weka simple K-means clustering assignments

Tags:

cluster-analysis

k-means

data-mining

weka

machine yearning

People also ask

2 Answers

amon.gammon

machine yearning

Recent Activity

Donate For Us