Display cluster labels for a scipy dendrogram

Tags:

I'm using hierarchical clustering to cluster word vectors, and I want the user to be able to display a dendrogram showing the clusters. However, since there can be thousands of words, I want this dendrogram to be truncated to some reasonable valuable, with the label for each leaf being a string of the most significant words in that cluster.

My problem is that, according to the docs, "The labels[i] value is the text to put under the ith leaf node only if it corresponds to an original observation and not a non-singleton cluster." I take this to mean I can't label clusters, only singular points?

To illustrate, here is a short python script which generates a simple labeled dendrogram:

import numpy as np
from scipy.cluster.hierarchy import dendrogram, linkage
from matplotlib import pyplot as plt

randomMatrix = np.random.uniform(-10,10,size=(20,3))
linked = linkage(randomMatrix, 'ward')

labelList = ["foo" for i in range(0, 20)]

plt.figure(figsize=(15, 12))
dendrogram(
            linked,
            orientation='right',
            labels=labelList,
            distance_sort='descending',
            show_leaf_counts=False
          )
plt.show()

a dendrogram of randomly generated points

Now let's say I want to truncate to just 5 leaves, and for each leaf, label it like "foo, foo, foo...", ie the words that make up that cluster. (Note: generating these labels is not the issue here.) I truncate it, and supply a label list to match:

labelList = ["foo, foo, foo..." for i in range(0, 5)]
dendrogram(
            linked,
            orientation='right',
            p=5,
            truncate_mode='lastp',
            labels=labelList,
            distance_sort='descending',
            show_leaf_counts=False
          )

and here's the problem, no labels:

enter image description here

I'm thinking there might be a use here for the parameter 'leaf_label_func' but I'm not sure how to use it.

652

asked Mar 08 '16 16:03

EmmetOT

1 Answers

you can simply write:

hierarchy.dendrogram(Z, labels=label_list)

Here is a good example, using pandas Data Frame :

import numpy as np
import pandas as pd
from scipy.cluster import hierarchy
import matplotlib.pyplot as plt

data = [[24, 16], [13, 4], [24, 11], [34, 18], [41, 
6], [35, 13]]
frame = pd.DataFrame(np.array(data), columns=["Rape", 
"Murder"], index=["Atlanta", "Boston", "Chicago", 
"Dallas", "Denver", "Detroit"])

Z = hierarchy.linkage(frame, 'single')
plt.figure()
dn = hierarchy.dendrogram(Z, labels=frame.index)

answered Sep 21 '22 00:09

Mohammad Forouhesh

Related questions
                            
                                Why are there extra empty strings at the beginning and end of the list returned by re.split?
                            
                                Django's migrate command on Amazon Elastic Beanstalk is killed
                            
                                access to bin counts in seaborn distplot
                            
                                How to create a random networkx graph with random weights [closed]
                            
                                How to obtain antonyms through word2vec?
                            
                                Stop infinite page load in selenium webdriver - python
                            
                                next_is_valid() doesn't exist in flask-login?
                            
                                Python: Can't start new thread. <100 active threads
                            
                                Get file from POST request using Python's BaseHTTPServer
                            
                                Catching ConnectionResetError with Python
                            
                                Finding head of a noun phrase in NLTK and stanford parse according to the rules of finding head of a NP
                            
                                pyspark: TypeError: IntegerType can not accept object in type <type 'unicode'>
                            
                                Assigning to vs. from a slice
                            
                                Pass a parameter to Ansible's dynamic inventory
                            
                                How to continue a frame execution from last attempted instruction after handling an exception?
                            
                                unable to open jupyter(ipython) notebook on browser
                            
                                Python Traceback (most recent call last) [duplicate]
                            
                                matplotlib legend: How to specify font weight?
                            
                                How to read data into Tensorflow?
                            
                                PyCharm - how to suspend all threads

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Display cluster labels for a scipy dendrogram

Tags:

python

matplotlib

scipy

dendrogram

EmmetOT

People also ask

1 Answers

Mohammad Forouhesh

Recent Activity

Donate For Us