In my application I'm transitioning from R to native Python (scipy + matplotlib) where possible, and one of the biggest tasks was converting from a R heatmap to a matplotlib heatmap. This post guided me with the porting. While most of it was painless, I'm still not convinced on the colormap.
Before showing code, an explanation: in the R code I defined "breaks", i.e. a fixed number of points starting from the lowest value up to 10, and ideally centered on the median value of the data. Its equivalent here would be with numpy.linspace
:
# Matrix is a DataFrame object from pandas
import numpy as np
data_min = min(matrix.min(skipna=True))
data_max = max(matrix.max(skipna=True))
median_value = np.median(matrix.median(skipna=True))
range_min = np.linspace(0, median_value, 50)
range_max = np.linspace(median_value, data_max, 50)
breaks = np.concatenate((range_min, range_max))
This gives us 100 points that will be used for coloring. However, I'm not sure on how to do the exact same thing in Python. Currently I have:
def red_black_green():
cdict = {
'red': ((0.0, 0.0, 0.0),
(0.5, 0.0, 0.0),
(1.0, 1.0, 1.0)),
'blue': ((0.0, 0.0, 0.0),
(1.0, 0.0, 0.0)),
'green': ((0.0, 0.0, 1.0),
(0.5, 0.0, 0.0),
(1.0, 0.0, 0.0))
}
my_cmap = mpl.colors.LinearSegmentedColormap(
'my_colormap', cdict, 100)
return my_cmap
And further down I do:
# Note: vmin and vmax are the maximum and the minimum of the data
# Adjust the max and min to scale these colors
if vmin > 0:
norm = mpl.colors.Normalize(vmin=0, vmax=vmax / 1.08)
else:
norm = mpl.colors.Normalize(vmin / 2, vmax / 2)
The numbers are totally empirical, that's why I want to change this into something more robust. How can I normalize my color map basing on the median, or do I need normalization at all?
By default, matplotlib will normalise the colormap such that the maximum colormap value will be the maximum of your data. Likewise for the minimum of your data. This means that the median of the colormap (the middle value) will line up with the interpolated median of your data (interpolated if you don't have a data point exactly at the median).
Here's an example:
from numpy.random import rand
import matplotlib.pyplot as plt
import matplotlib.colors as mcolors
cdict = {'red': ((0.0, 0.0, 0.0),
(0.5, 0.0, 0.0),
(1.0, 1.0, 1.0)),
'blue': ((0.0, 0.0, 0.0),
(1.0, 0.0, 0.0)),
'green': ((0.0, 0.0, 1.0),
(0.5, 0.0, 0.0),
(1.0, 0.0, 0.0))}
cmap = mcolors.LinearSegmentedColormap(
'my_colormap', cdict, 100)
ax = plt.subplot(111)
im = ax.imshow(2*rand(20, 20) + 1.5, cmap=cmap)
plt.colorbar(im)
plt.show()
Notice the middle of the colour bar takes value 2.5. This is the median of the data range: (min + max) / 2 = (1.5+3.5) / 2 = 2.5.
Hope this helps.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With