Usually when I do dendrograms and heatmaps, I use a distance matrix and do a bunch of <code>SciPy</code> stuff. I want to try out <code>Seaborn</code> but <code>Seaborn</code> wants my data in rectangular form (rows=samples, cols=attributes, not a distance matrix)? I essentially want to use <code>seaborn</code> as the backend to compute my dendrogram and tack it on to my heatmap. Is this possible? If not, can this be a feature in the future. Maybe there are parameters I can adjust so it can take a distance matrix instead of a rectangular matrix? Here's the usage: <pre class="prettyprint"><code>seaborn.clustermap¶ seaborn.clustermap(data, pivot_kws=None, method='average', metric='euclidean', z_score=None, standard_scale=None, figsize=None, cbar_kws=None, row_cluster=True, col_cluster=True, row_linkage=None, col_linkage=None, row_colors=None, col_colors=None, mask=None, **kwargs) </code></pre> My code below: <pre class="prettyprint"><code>from sklearn.datasets import load_iris iris = load_iris() X, y = iris.data, iris.target DF = pd.DataFrame(X, index = ["iris_%d" % (i) for i in range(X.shape[0])], columns = iris.feature_names) </code></pre> <img src="https://i.stack.imgur.com/U1Jpe.png" alt="enter image description here"> I don't think my method is correct below because I'm giving it a precomputed distance matrix and NOT a rectangular data matrix as it requests. There's no examples of how to use a correlation/distance matrix with <code>clustermap</code> but there is for https://stanford.edu/~mwaskom/software/seaborn/examples/network_correlations.html but the ordering is not clustered w/ the plain <code>sns.heatmap</code> func. <pre class="prettyprint"><code>DF_corr = DF.T.corr() DF_dism = 1 - DF_corr sns.clustermap(DF_dism) </code></pre> <img src="https://i.stack.imgur.com/xHlZR.png" alt="enter image description here">

You can pass the precomputed distance matrix as linkage to <code>clustermap()</code>: <pre class="prettyprint"><code>import pandas as pd, seaborn as sns import scipy.spatial as sp, scipy.cluster.hierarchy as hc from sklearn.datasets import load_iris sns.set(font="monospace") iris = load_iris() X, y = iris.data, iris.target DF = pd.DataFrame(X, index = ["iris_%d" % (i) for i in range(X.shape[0])], columns = iris.feature_names) DF_corr = DF.T.corr() DF_dism = 1 - DF_corr # distance matrix linkage = hc.linkage(sp.distance.squareform(DF_dism), method='average') sns.clustermap(DF_dism, row_linkage=linkage, col_linkage=linkage) </code></pre> For <code>clustermap(distance_matrix)</code> (i.e., without linkage passed), the linkage is calculated internally based on pairwise distances of the rows and columns in the distance matrix (see note below for full details) instead of using the elements of the distance matrix directly (the correct solution). As a result, the output is somewhat different from the one in the question: <img src="https://i.stack.imgur.com/g3Qqo.png" alt="clustermap"> Note: if no <code>row_linkage</code> is passed to <code>clustermap()</code>, the row linkage is determined internally by considering each row a "point" (observation) and calculating the pairwise distances between the points. So the row dendrogram reflects row similarity. Analogous for <code>col_linkage</code>, where each column is considered a point. This explanation should likely be added to the docs. Here the docs's first example modified to make the internal linkage calculation explicit: <pre class="prettyprint"><code>import seaborn as sns; sns.set() import scipy.spatial as sp, scipy.cluster.hierarchy as hc flights = sns.load_dataset("flights") flights = flights.pivot("month", "year", "passengers") row_linkage, col_linkage = (hc.linkage(sp.distance.pdist(x), method='average') for x in (flights.values, flights.values.T)) g = sns.clustermap(flights, row_linkage=row_linkage, col_linkage=col_linkage) # note: this produces the same plot as "sns.clustermap(flights)", where # clustermap() calculates the row and column linkages internally </code></pre>

How to give sns.clustermap a precomputed distance matrix?

Tags:

Usually when I do dendrograms and heatmaps, I use a distance matrix and do a bunch of SciPy stuff. I want to try out Seaborn but Seaborn wants my data in rectangular form (rows=samples, cols=attributes, not a distance matrix)?

I essentially want to use seaborn as the backend to compute my dendrogram and tack it on to my heatmap. Is this possible? If not, can this be a feature in the future.

Maybe there are parameters I can adjust so it can take a distance matrix instead of a rectangular matrix?

Here's the usage:

seaborn.clustermap¶ seaborn.clustermap(data, pivot_kws=None, method='average', metric='euclidean',  z_score=None, standard_scale=None, figsize=None, cbar_kws=None, row_cluster=True,  col_cluster=True, row_linkage=None, col_linkage=None, row_colors=None,  col_colors=None, mask=None, **kwargs)

My code below:

from sklearn.datasets import load_iris iris = load_iris() X, y = iris.data, iris.target DF = pd.DataFrame(X, index = ["iris_%d" % (i) for i in range(X.shape[0])], columns = iris.feature_names)

enter image description here

I don't think my method is correct below because I'm giving it a precomputed distance matrix and NOT a rectangular data matrix as it requests. There's no examples of how to use a correlation/distance matrix with clustermap but there is for https://stanford.edu/~mwaskom/software/seaborn/examples/network_correlations.html but the ordering is not clustered w/ the plain sns.heatmap func.

DF_corr = DF.T.corr() DF_dism = 1 - DF_corr sns.clustermap(DF_dism)

enter image description here

343

asked Aug 01 '16 18:08

O.rka

1 Answers

You can pass the precomputed distance matrix as linkage to clustermap():

import pandas as pd, seaborn as sns import scipy.spatial as sp, scipy.cluster.hierarchy as hc from sklearn.datasets import load_iris sns.set(font="monospace")  iris = load_iris() X, y = iris.data, iris.target DF = pd.DataFrame(X, index = ["iris_%d" % (i) for i in range(X.shape[0])], columns = iris.feature_names)  DF_corr = DF.T.corr() DF_dism = 1 - DF_corr   # distance matrix linkage = hc.linkage(sp.distance.squareform(DF_dism), method='average') sns.clustermap(DF_dism, row_linkage=linkage, col_linkage=linkage)

For clustermap(distance_matrix) (i.e., without linkage passed), the linkage is calculated internally based on pairwise distances of the rows and columns in the distance matrix (see note below for full details) instead of using the elements of the distance matrix directly (the correct solution). As a result, the output is somewhat different from the one in the question: clustermap

Note: if no row_linkage is passed to clustermap(), the row linkage is determined internally by considering each row a "point" (observation) and calculating the pairwise distances between the points. So the row dendrogram reflects row similarity. Analogous for col_linkage, where each column is considered a point. This explanation should likely be added to the docs. Here the docs's first example modified to make the internal linkage calculation explicit:

import seaborn as sns; sns.set() import scipy.spatial as sp, scipy.cluster.hierarchy as hc flights = sns.load_dataset("flights") flights = flights.pivot("month", "year", "passengers") row_linkage, col_linkage = (hc.linkage(sp.distance.pdist(x), method='average')   for x in (flights.values, flights.values.T)) g = sns.clustermap(flights, row_linkage=row_linkage, col_linkage=col_linkage)    # note: this produces the same plot as "sns.clustermap(flights)", where   #  clustermap() calculates the row and column linkages internally

126

answered Sep 17 '22 13:09

Ulrich Stern

Related questions
                            
                                What is CompletableFuture's equivalent of flatMap?
                            
                                How to access element of a VectorUDT column in a Spark DataFrame?
                            
                                access root element of react component
                            
                                Convert .json to ipynb
                            
                                Fabricjs Textbox make the text shrink to fit
                            
                                How can I pass a variable from 'outside' to a react app?
                            
                                Observe LiveData from foreground service
                            
                                A forkJoin alternative for uncompleted observables?
                            
                                Define private registry in package.json
                            
                                "heroku run" gives error "CERT_HAS_EXPIRED: certificate has expired"
                            
                                NETCORE MVC - How to work with nested, multi-parameterized routes
                            
                                Neat way to parametrize function template with generic function pointer

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With