Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Getting Xmeans clusterer output programmatically in Weka

When using Kmeans in Weka, one can call getAssignments() on the resulting output of the model to get the cluster assignment for each given instance. Here's a (truncated) Jython example:

>>>import weka.clusterers.SimpleKMeans as kmeans
>>>kmeans.buildClusterer(data)
>>>assignments = kmeans.getAssignments()
>>>assignments
>>>array('i',[14, 16, 0, 0, 0, 0, 16,...])

The index of each cluster number corresponds to the instance. So, instance 0 is in cluster 14, instance 1 is in cluster 16, and so on.

My question is: Is there something similar for Xmeans? I've gone through the entire API here and don't see anything like that.

like image 484
Renklauf Avatar asked Sep 16 '12 23:09

Renklauf


1 Answers

Here's a reply to my question from the Weka listserv:

 "Not as such. But all clusterers have a clusterInstance() method. You can 
 pass each training instance through the trained clustering model to 
 obtain the cluster index for each."

Here's my Jython implementation of this suggestion:

 >>> import java.io.FileReader as FileReader
 >>> import weka.core.Instances as Instances
 >>> import weka.clusterers.XMeans as xmeans
 >>> import java.io.BufferedReader as read
 >>> import java.io.FileReader
 >>> import java.io.File
 >>> read = read(FileReader("some arff file"))
 >>> data = Instances(read)
 >>> file = FileReader("some arff file")
 >>> data = Instances(file)
 >>> xmeans = xmeans()
 >>> xmeans.setMaxNumClusters(100)  
 >>> xmeans.setMinNumClusters(2) 
 >>> xmeans.buildClusterer(data)# here's our model 
 >>> enumerated_instances = data.enumerateInstances() #get the index of each instance 
 >>> for index, instance in enumerate(enumerated_instances):
         cluster_num = xmeans.clusterInstance(instance) #pass each instance through the model
         print "instance # ",index,"is in cluster ", cluster_num #pretty print results

 instance # 0 is in cluster  1
 instance # 1 is in cluster  1
 instance # 2 is in cluster  0
 instance # 3 is in cluster  0

I'm leaving all of this up as a reference, since the same approach could be use to get cluster assignments for the results of any of Weka's clusterers.

like image 130
Renklauf Avatar answered Sep 21 '22 17:09

Renklauf