Extracting clusters from seaborn clustermap

Tags:

I am using the seaborn clustermap to create clusters and visually it works great (this example produces very similar results).

However I am having trouble figuring out how to programmatically extract the clusters. For instance, in the example link, how could I find out that 1-1 rh, 1-1 lh, 5-1 rh, 5-1 lh make a good cluster? Visually it's easy. I am trying to use methods of looking through the data, and dendrograms but I'm having little success

EDIT Code from example:

import pandas as pd import seaborn as sns sns.set(font="monospace")  df = sns.load_dataset("brain_networks", header=[0, 1, 2], index_col=0) used_networks = [1, 5, 6, 7, 8, 11, 12, 13, 16, 17] used_columns = (df.columns.get_level_values("network")                           .astype(int)                           .isin(used_networks)) df = df.loc[:, used_columns]  network_pal = sns.cubehelix_palette(len(used_networks),                                     light=.9, dark=.1, reverse=True,                                     start=1, rot=-2) network_lut = dict(zip(map(str, used_networks), network_pal))  networks = df.columns.get_level_values("network") network_colors = pd.Series(networks).map(network_lut)  cmap = sns.diverging_palette(h_neg=210, h_pos=350, s=90, l=30, as_cmap=True)  result = sns.clustermap(df.corr(), row_colors=network_colors, method="average",                col_colors=network_colors, figsize=(13, 13), cmap=cmap)

How can I pull what models are in which clusters out of result?

EDIT2 The result does carry with it a linkage in with the dendrogram_col which I THINK would work with fcluster. But the threshold value to select that is confusing me. I would assume that values in the heatmap that are higher than the threshold would get clustered together?

545

asked Jan 13 '15 14:01

sedavidw

1 Answers

While using result.linkage.dendrogram_col or result.linkage.dendrogram_row will currently work, it seems to be an implementation detail. The safest route is to first compute the linkages explicitly and pass them to the clustermap function, which has row_linkage and col_linkage parameters just for that.

Replacing the last line in your example (result = ...) with the following code gives the same result as before, but you will also have row_linkage and col_linkage variables that you can use with fcluster etc.

from scipy.spatial import distance from scipy.cluster import hierarchy  correlations = df.corr() correlations_array = np.asarray(df.corr())  row_linkage = hierarchy.linkage(     distance.pdist(correlations_array), method='average')  col_linkage = hierarchy.linkage(     distance.pdist(correlations_array.T), method='average')  sns.clustermap(correlations, row_linkage=row_linkage, col_linkage=col_linkage, row_colors=network_colors, method="average",                col_colors=network_colors, figsize=(13, 13), cmap=cmap)

In this particular example, the code could be simplified more since the correlations array is symmetric and therefore row_linkage and col_linkage will be identical.

Note: A previous answer included a call to distance.squareshape according to what the code in seaborn does, but that is a bug.

141

answered Sep 24 '22 23:09

Marcel M

Related questions
                            
                                How to debug Jinja2 template?
                            
                                Writing good tests for Django applications
                            
                                Pycharm visual warning about unresolved attribute reference
                            
                                What's the difference between --find-links and --index-url pip flags?
                            
                                Apache Airflow DAG cannot import local module
                            
                                Sort python list by function
                            
                                Using JSON Type with Flask-sqlalchemy & Postgresql
                            
                                Visual Studio Code (Mac OS) rename symbol doesn't work
                            
                                Django + apache & mod_wsgi: having to restart apache after changes
                            
                                How to upload a file to Google Drive using a Python script?
                            
                                Keep Secret Keys Out
                            
                                Stepwise Regression in Python
                            
                                How to install pandas for Python 3?
                            
                                What is the time complexity of dict.keys() in Python?
                            
                                Generating multiple random (x, y) coordinates, excluding duplicates?
                            
                                How do I add a new attribute to an edge in networkx?
                            
                                Read cell content in an ipython notebook
                            
                                What can `__init__` do that `__new__` cannot?
                            
                                Disadvantages of using ASGI instead of WSGI [closed]
                            
                                Python Pandas: Does 'loc' and 'iloc' stand for anything?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Extracting clusters from seaborn clustermap

Tags:

python

cluster-analysis

seaborn

dendrogram

hierarchical-clustering

sedavidw

People also ask

1 Answers

Marcel M

Recent Activity

Donate For Us