Can't get scipy hierarchical clustering to work

Question

I wrote a simple script that is intended to do hierarchical clustering on a simple test dataset. The test data that was used.

I found the function fclusterdata to be a candidate to cluster my data into two clusters. It takes two mandatory call parameters: the data set and a threshold. The problem is, I couldn't find a threshold that would yield the expected two clusters.

I'd be happy if anyone can tell me what I am doing wrong. I'd also be happy if anyone could point on other approaches that would be better suited for my clustering (I explicitly want to avoid to specify the number of clusters beforehand.)

Here is my code:

import time
import scipy.cluster.hierarchy as hcluster
import numpy.random as random
import numpy

import pylab
pylab.ion()

data = random.randn(2,200)

data[:100,:100] += 10

for i in range(5,15):
    thresh = i/10.
    clusters = hcluster.fclusterdata(numpy.transpose(data), thresh)
    pylab.scatter(*data[:,:], c=clusters)
    pylab.axis("equal")
    title = "threshold: %f, number of clusters: %d" % (thresh, len(set(clusters)))
    print title
    pylab.title(title)
    pylab.draw()
    time.sleep(0.5)
    pylab.clf()

Here is the output:

threshold: 0.500000, number of clusters: 129
threshold: 0.600000, number of clusters: 129
threshold: 0.700000, number of clusters: 129
threshold: 0.800000, number of clusters: 75
threshold: 0.900000, number of clusters: 75
threshold: 1.000000, number of clusters: 73
threshold: 1.100000, number of clusters: 58
threshold: 1.200000, number of clusters: 1
threshold: 1.300000, number of clusters: 1
threshold: 1.400000, number of clusters: 1

Diego · Accepted Answer

Note that the function reference has an error. The correct definition of the t parameter is: "The cut-off threshold for the cluster function or the maximum number of clusters (criterion=’maxclust’)".

So try this:

clusters = hcluster.fclusterdata(numpy.transpose(data), 2, criterion='maxclust', metric='euclidean', depth=1, method='centroid')

Can't get scipy hierarchical clustering to work

Tags:

python

scipy

cluster-analysis

hierarchical-clustering

moooeeeep

1 Answers

Diego

Recent Activity

Donate For Us

Can't get scipy hierarchical clustering to work

Tags:

python

scipy

cluster-analysis

hierarchical-clustering

moooeeeep

1 Answers

Diego

Related questions

Recent Activity

Donate For Us