Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Logarithmic color scale in plotly

I'm trying to visualize the data with some outliers using Plotly and Python3. Outliers cause the color scale legend to look badly: there are only few high data points, but the legend looks bad: space between 2k and 10k is too big.

So the question is, how to change the appearance of 'color legend' on the right (see image below), so it will show the difference between 0 to 2k mostly? Unfortunately, couldn't get an answer from this doc file

Sample code (jupyter notebook):

import numpy as np
from plotly.offline import download_plotlyjs, init_notebook_mode, iplot
from plotly.graph_objs import *
init_notebook_mode()

x = np.random.randn(100,1) + 3
y = np.random.randn(100,1) + 10
x = np.reshape(x, 100)
y = np.reshape(y, 100)

color = np.random.randint(0,1000, [100])
color[[1,3,5]] = color[[1,3,5]] + 10000 # create outliers in color var

trace = Scatter(
    x = x,
    y = y,
    mode = 'markers',
    marker=dict(
        color = color,
        showscale=True,
        colorscale = [[0, 'rgb(166,206,227, 0.5)'],
                      [0.05, 'rgb(31,120,180,0.5)'],
                      [0.1, 'rgb(178,223,138,0.5)'],
                      [0.15, 'rgb(51,160,44,0.5)'],
                      [0.2, 'rgb(251,154,153,0.5)'],
                      [1, 'rgb(227,26,28,0.5)']
                     ]
    )
)

fig = Figure(data=[trace])
iplot(fig)

Plot

What i'm looking for:

Legend

like image 762
Claud H Avatar asked May 17 '18 13:05

Claud H


2 Answers

You can accomplish what I think you're after by customizing the colorscale, cmin, and cmax properties to have a discrete color change at exactly 2000. Then you can customize colorbar.tickvals to label the boundary as 2000. See https://plot.ly/python/reference/#scatter-marker-colorbar.

import numpy as np
from plotly.offline import download_plotlyjs, init_notebook_mode, iplot
from plotly.graph_objs import *
init_notebook_mode()

x = np.random.randn(100,1) + 3
y = np.random.randn(100,1) + 10
x = np.reshape(x, 100)
y = np.reshape(y, 100)

color = np.random.randint(0,1000, [100])
color[[1,3,5]] = color[[1,3,5]] + 10000 # create outliers in color var

bar_max = 2000
factor = 0.9  # Normalized location where continuous colorscale should end

trace = Scatter(
    x = x,
    y = y,
    mode = 'markers',
    marker=dict(
        color = color,
        showscale=True,
        cmin=0,
        cmax= bar_max/factor,
        colorscale = [[0, 'rgb(166,206,227, 0.5)'],
                      [0.05, 'rgb(31,120,180,0.5)'],
                      [0.2, 'rgb(178,223,138,0.5)'],
                      [0.5, 'rgb(51,160,44,0.5)'],
                      [factor, 'rgb(251,154,153,0.5)'],
                      [factor, 'rgb(227,26,28,0.5)'],
                      [1, 'rgb(227,26,28,0.5)']
                     ],
        colorbar=dict(
            tickvals = [0, 500, 1000, 1500, 2000],
            ticks='outside'
        )
    )
)

fig = Figure(data=[trace])
iplot(fig)

New figure result
New figure result

like image 116
Jon Mease Avatar answered Oct 19 '22 02:10

Jon Mease


Since you asked with a precise question, I try to reply with a precise answer, even if I don't think this could not be the best in data visualization. Later I show you why.

Anyway, you can normalize the values of the colors and "squeeze" your data in a much smaller interval. It mathematically represents the power to which the number e must be raised to produce the original value. You can use log10 if you're more comfortable with.

The code is very very simple, I attach only the trace definition as the rest is unchanged. I placed a standard cmap for convenience as the interval of the values is continuous.

trace = Scatter(
    x = x,
    y = y,
    mode = 'markers',
    marker=dict(
        color = np.log(color),
        showscale=True,
        colorscale = 'RdBu'
    )
)

enter image description here

As I said, transforming the values with log isn't always the best. It actually forces the observer to a rough reading of the graph. As example, nevertheless in my example the orange markers range between 410 and 950, can you tell the difference?

like image 24
el_Rinaldo Avatar answered Oct 19 '22 01:10

el_Rinaldo