Display cluster labels for a scipy dendrogram

I'm using hierarchical clustering to cluster word vectors, and I want the user to be able to display a dendrogram showing the clusters. However, since there can be thousands of words, I want this dendrogram to be truncated to some reasonable valuable, with the label for each leaf being a string of the most significant words in that cluster.

My problem is that, according to the docs, "The labels[i] value is the text to put under the ith leaf node only if it corresponds to an original observation and not a non-singleton cluster." I take this to mean I can't label clusters, only singular points?

To illustrate, here is a short python script which generates a simple labeled dendrogram:

import numpy as np
from scipy.cluster.hierarchy import dendrogram, linkage
from matplotlib import pyplot as plt

randomMatrix = np.random.uniform(-10,10,size=(20,3))
linked = linkage(randomMatrix, 'ward')

labelList = ["foo" for i in range(0, 20)]

plt.figure(figsize=(15, 12))

a dendrogram of randomly generated points

Now let's say I want to truncate to just 5 leaves, and for each leaf, label it like "foo, foo, foo...", ie the words that make up that cluster. (Note: generating these labels is not the issue here.) I truncate it, and supply a label list to match:

labelList = ["foo, foo, foo..." for i in range(0, 5)]

and here's the problem, no labels:

enter image description here

I'm thinking there might be a use here for the parameter 'leaf_label_func' but I'm not sure how to use it.

1 Answers

you can simply write:

hierarchy.dendrogram(Z, labels=label_list)

Here is a good example, using pandas Data Frame :

import numpy as np
import pandas as pd
from scipy.cluster import hierarchy
import matplotlib.pyplot as plt

data = [[24, 16], [13, 4], [24, 11], [34, 18], [41, 
6], [35, 13]]
frame = pd.DataFrame(np.array(data), columns=["Rape", 
"Murder"], index=["Atlanta", "Boston", "Chicago", 
"Dallas", "Denver", "Detroit"])

Z = hierarchy.linkage(frame, 'single')
dn = hierarchy.dendrogram(Z, labels=frame.index)
