Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Plotting sorted heatmap keeping (x,y) value colors

I've been working with python, pandas and seaborn to get a heatmap with different colormaps/columns. Thanks to this question I did the following:

Sample Dataframe (sample.csv):

X,a,b,c
A,0.5,0.7,0.4
B,0.9,0.3,0.8
C,0.3,0.4,0.7

Plot Heatmap with Seaborn

import pandas as pd
import matplotlib as mpl
# Set new Backend to Use Seaborn
# mpl.use('Agg')
import seaborn as sns
import matplotlib.pyplot as plt
import matplotlib.cm as cm
import colorsys

# Working example data:
df = pd.DataFrame([[0.5,0.7,0.4],[.9,.3,.8],[.3,.4,.7]],['A','B','C'])    

# Get Color List
N = 3
COL = [colorsys.hsv_to_rgb(x*1.0/N, 0.7, 0.5) for x in range(N)]

with sns.axes_style('white'):

    for i, name in enumerate(df.columns):

        # Create cmap
        colors = COL[i]
        cmap = sns.light_palette(colors, input='rgb', reverse=False, as_cmap=True)

        sns.heatmap(df.mask(df.isin(df[name])!=1),
                    cbar=False,
                    square=True,
                    annot=False,
                    cmap=cmap,
                    linewidths=0.1)
plt.show()

This produce a heatmap with different colormaps / column (the values are only there to clarify the problem):

enter image description here

Now I would like to produce the same plot using the sorted dataframe like:

X,col1,col2,col3
A,0.7,0.5,0.4
B,0.9,0.8,0.3
C,0.7,0.4,0.3

Trying to keep the original color of the pair (index,column), like the following expected draft output (values are indicative, what I would need are only the colors):

enter image description here

EDIT:

Fixed some typos, now df is the dataframe representing the working matrix.

like image 560
Fabio Lamanna Avatar asked Dec 03 '15 16:12

Fabio Lamanna


Video Answer


2 Answers

You could once iterate over the array and get the colours corresponding to each value and store them in an NxMx3 (image) array. Then sort the array and the image in the same manner e.g. get the sort indices from the original array and apply them to the image array. Then you can display the image with plt.imshow You then can add with matplotlib labels, ticks, etc.

This could look like that: You should first create an NxMx3 array to store the colors.

im = np.zeros((df.shape[0], df.shape[1], 3))

You then can iterate over every column and scale your value from 0 to 255, e.g.

color_index = (value-min(column))/(max(column)-min(column)*255

then you can use

color = cmap(color_index)

im[col_index, row_index, :] = color

When you have iterated over every column, you have all the colors stored in im

The resulting code would be:

import pandas as pd
import matplotlib as mpl
# Set new Backend to Use Seaborn
# mpl.use('Agg')
import seaborn as sns
import matplotlib.pyplot as plt
import matplotlib.cm as cm
import colorsys
import numpy as np

# Working example data:
df = pd.DataFrame([[0.5,0.7,0.4],[.9,.3,.8],[.3,.4,.7]],['A','B','C'])    

# Get Color List
N = 3
COL = [colorsys.hsv_to_rgb(x*1.0/N, 0.7, 0.5) for x in range(N)]

im = np.zeros((df.shape[0], df.shape[1], 4))

with sns.axes_style('white'):

    for i, name in enumerate(df.columns):

        # Create cmap
        colors = COL[i]
        cmap = sns.light_palette(colors, input='rgb', reverse=False, as_cmap=True)
        values = np.array(df[name])
        color_indices = (values-np.min(values))/(np.max(values)-np.min(values))
        im[:,i,:] = cmap(color_indices)

im2 = im.copy()
for i, name in enumerate(df.T.columns):
    values = np.array(df.T[name])
    print(values)
    sorting = np.argsort(values)
    print("sorting", sorting)
    im2[i, ::-1, :] = im[i, sorting, :]
plt.imshow(im2, interpolation="nearest")
plt.grid(False)
plt.show()
like image 146
Randrian Avatar answered Oct 29 '22 02:10

Randrian


With seaborn heatmap you just need to provide the different color maps and, independently of the order, set vmin and vmax. Actually,

vmin, vmax : floats, optional

Values to anchor the colormap, otherwise they are inferred from the data and other keyword arguments.

Which means that you should not need to specify the min/max values unless you want them outside of your data points.

like image 31
rll Avatar answered Oct 29 '22 01:10

rll