Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Green to red colormap in matplotlib, centered on the median of the data

In my application I'm transitioning from R to native Python (scipy + matplotlib) where possible, and one of the biggest tasks was converting from a R heatmap to a matplotlib heatmap. This post guided me with the porting. While most of it was painless, I'm still not convinced on the colormap.

Before showing code, an explanation: in the R code I defined "breaks", i.e. a fixed number of points starting from the lowest value up to 10, and ideally centered on the median value of the data. Its equivalent here would be with numpy.linspace:

# Matrix is a DataFrame object from pandas
import numpy as np

data_min = min(matrix.min(skipna=True))
data_max = max(matrix.max(skipna=True))
median_value = np.median(matrix.median(skipna=True))

range_min = np.linspace(0, median_value, 50)
range_max = np.linspace(median_value, data_max, 50)
breaks = np.concatenate((range_min, range_max))

This gives us 100 points that will be used for coloring. However, I'm not sure on how to do the exact same thing in Python. Currently I have:

def red_black_green():
    cdict = {
       'red': ((0.0, 0.0, 0.0),
               (0.5, 0.0, 0.0),
               (1.0, 1.0, 1.0)),
       'blue': ((0.0, 0.0, 0.0),
                (1.0, 0.0, 0.0)),
       'green': ((0.0, 0.0, 1.0),
                 (0.5, 0.0, 0.0),
                 (1.0, 0.0, 0.0))
       }

    my_cmap = mpl.colors.LinearSegmentedColormap(
        'my_colormap', cdict, 100)

    return my_cmap

And further down I do:

# Note: vmin and vmax are the maximum and the minimum of the data

# Adjust the max and min to scale these colors
if vmin > 0:
    norm = mpl.colors.Normalize(vmin=0, vmax=vmax / 1.08)
else:
    norm = mpl.colors.Normalize(vmin / 2, vmax / 2)

The numbers are totally empirical, that's why I want to change this into something more robust. How can I normalize my color map basing on the median, or do I need normalization at all?

like image 571
Einar Avatar asked Oct 06 '22 07:10

Einar


1 Answers

By default, matplotlib will normalise the colormap such that the maximum colormap value will be the maximum of your data. Likewise for the minimum of your data. This means that the median of the colormap (the middle value) will line up with the interpolated median of your data (interpolated if you don't have a data point exactly at the median).

Here's an example:

from numpy.random import rand
import matplotlib.pyplot as plt
import matplotlib.colors as mcolors

cdict = {'red':   ((0.0, 0.0, 0.0),
                   (0.5, 0.0, 0.0),
                   (1.0, 1.0, 1.0)),
         'blue':  ((0.0, 0.0, 0.0),
                   (1.0, 0.0, 0.0)),
         'green': ((0.0, 0.0, 1.0),
                   (0.5, 0.0, 0.0),
                   (1.0, 0.0, 0.0))}

cmap = mcolors.LinearSegmentedColormap(
'my_colormap', cdict, 100)

ax = plt.subplot(111)
im = ax.imshow(2*rand(20, 20) + 1.5, cmap=cmap)
plt.colorbar(im)
plt.show()

Notice the middle of the colour bar takes value 2.5. This is the median of the data range: (min + max) / 2 = (1.5+3.5) / 2 = 2.5.

Hope this helps.

like image 173
dmcdougall Avatar answered Oct 12 '22 16:10

dmcdougall