The following code generates a simple hierarchical cluster dendrogram with 10 leaf nodes:
import scipy
import scipy.cluster.hierarchy as sch
import matplotlib.pylab as plt
X = scipy.randn(10,2)
d = sch.distance.pdist(X)
Z= sch.linkage(d,method='complete')
P =sch.dendrogram(Z)
plt.show()
I generate three flat clusters like so:
T = sch.fcluster(Z, 3, 'maxclust')
# array([3, 1, 1, 2, 2, 2, 2, 2, 1, 2])
However, I'd like to see the cluster labels 1,2,3 on the dendrogram. It's easy for me to visualize with just 10 leaf nodes and three clusters, but when I have 1000 nodes and 10 clusters, I can't see what's going on.
How do I show the cluster numbers on the dendrogram? I'm open to other packages. Thanks.
In the dendrogram locate the largest vertical difference between nodes, and in the middle pass an horizontal line. The number of vertical lines intersecting it is the optimal number of clusters (when affinity is calculated using the method set in linkage).
A dendrogram is a diagram that shows the hierarchical relationship between objects. It is most commonly created as an output from hierarchical clustering. The main use of a dendrogram is to work out the best way to allocate objects to clusters.
cluster. hierarchy ) These functions cut hierarchical clusterings into flat clusterings or find the roots of the forest formed by a cut by providing the flat cluster ids of each observation.
Here is a solution that appropriately colors the clusters and labels the leaves of the dendrogram with the appropriate cluster name (leaves are labeled: 'point number, cluster number'). These techniques can be used independently or together. I modified your original example to include both:
import scipy
import scipy.cluster.hierarchy as sch
import matplotlib.pylab as plt
n=10
k=3
X = scipy.randn(n,2)
d = sch.distance.pdist(X)
Z= sch.linkage(d,method='complete')
T = sch.fcluster(Z, k, 'maxclust')
# calculate labels
labels=list('' for i in range(n))
for i in range(n):
labels[i]=str(i)+ ',' + str(T[i])
# calculate color threshold
ct=Z[-(k-1),2]
#plot
P =sch.dendrogram(Z,labels=labels,color_threshold=ct)
plt.show()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With