I generated a dendrogram plot for my dataset and I am not happy how the splits at some levels have been ordered. I am thus looking for a way to swap the two branches (or leaves) of a single split.
If we look at the code and dendrogram plot at the bottom, there are two labels 11
and 25
split away from the rest of the big cluster. I am really unhappy about this, and would like that the branch with 11
and 25
to be the right branch of the split and the rest of the cluster to be the left branch. The shown distances would still be the same, and thus the data would not be changed, just the aesthetics.
Can this be done? And how? I am specifically for a manual intervention because the optimal leaf ordering algorithm supposedly does not work in this case.
import numpy as np
# random data set with two clusters
np.random.seed(65) # for repeatability of this tutorial
a = np.random.multivariate_normal([10, 0], [[3, 1], [1, 4]], size=[10,])
b = np.random.multivariate_normal([0, 20], [[3, 1], [1, 4]], size=[20,])
X = np.concatenate((a, b),)
# create linkage and plot dendrogram
from scipy.cluster.hierarchy import dendrogram, linkage
Z = linkage(X, 'ward')
plt.figure(figsize=(15, 5))
plt.title('Hierarchical Clustering Dendrogram')
plt.xlabel('sample index')
plt.ylabel('distance')
dendrogram(
Z,
leaf_rotation=90., # rotates the x axis labels
leaf_font_size=12., # font size for the x axis labels
)
plt.show()
I had a similar problem and got solved by using optimal_ordering option in linkage. I attach the code and result for your case, which might not be exactly what you like but seems highly improved to me.
import numpy as np
import matplotlib.pyplot as plt
# random data set with two clusters
np.random.seed(65) # for repeatability of this tutorial
a = np.random.multivariate_normal([10, 0], [[3, 1], [1, 4]], size=[10,])
b = np.random.multivariate_normal([0, 20], [[3, 1], [1, 4]], size=[20,])
X = np.concatenate((a, b),)
# create linkage and plot dendrogram
from scipy.cluster.hierarchy import dendrogram, linkage
Z = linkage(X, 'ward', optimal_ordering = True)
plt.figure(figsize=(15, 5))
plt.title('Hierarchical Clustering Dendrogram')
plt.xlabel('sample index')
plt.ylabel('distance')
dendrogram(
Z,
leaf_rotation=90., # rotates the x axis labels
leaf_font_size=12., # font size for the x axis labels
distance_sort=False,
show_leaf_counts=True,
count_sort=False
)
plt.show()
result of using optimal_ordering in linkage
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With