I am using <code>plotly express</code> for a scatter plot. The color of the markers is defined by a variable of my dataframe, as in the example below. <pre class="prettyprint"><code>import pandas as pd import numpy as np import plotly.express as px df = px.data.iris() fig = px.scatter(df[df.species.isin(['virginica', 'setosa'])], x="sepal_width", y="sepal_length", color="species") fig.show() </code></pre> <img src="https://i.stack.imgur.com/9I1MK.png" alt="enter image description here"> When I add another instance of this variable, the color mapping changes (First, 'virginica', is red, then green). <pre class="prettyprint"><code>fig = px.scatter(df, x="sepal_width", y="sepal_length", color="species",size='petal_length', hover_data=['petal_width']) fig.show() </code></pre> <img src="https://i.stack.imgur.com/y7x2P.png" alt="enter image description here"> How can I keep the mapping of the colors when adding variables?

<hr> <h3 id="short-answer-ianb">Short answer:</h3> <hr> 1. Assign colors to variables with <code>color_discrete_map</code> : <pre class="prettyprint"><code>color_discrete_map = {'virginica': 'blue', 'setosa': 'red', 'versicolor': 'green'} </code></pre> or: 2. Manage the order of your data to enable the correct color cycle with: <pre class="prettyprint"><code>order_df(df_input = df, order_by='species', order=['virginica', 'setosa', 'versicolor']) </code></pre> ... where <code>order_df</code> is a function that handles the ordering of long dataframes for which you'll find the complete definition in the code snippets below. <hr> <h3 id="the-details-21u3">The details:</h3> <hr> <h4 id="you-can-map-colors-to-variables-directly-with-y5t0">1. You can map colors to variables directly with:</h4> <pre class="prettyprint"><code>color_discrete_map = {'virginica': 'blue', 'setosa': 'red', 'versicolor': 'green'} </code></pre> The downside is that you'll have to specify variable names and colors. And that quickly becomes tedious if you're working with dataframes where the number of variables is not fixed. In which case it would be much more convenient to follow the default color sequence or specify one to your liking. So I would rather consider managing the order of your dataset so that you'll get the desired colormatching. <h4 id="the-source-of-the-real-challenge-a6h0">2. The source of the real challenge:</h4> <code>px.Scatter()</code> will assign color to variable in the order they appear in your dataframe. Here you're using two different sources<code>df</code> and <code>df[df.species.isin(['virginica', 'setosa', 'versicolor'])]</code> (let's name the latter <code>df2</code>). Running <code>df2['species'].unique()</code> will give you: <pre class="prettyprint"><code>array(['setosa', 'virginica'], dtype=object) </code></pre> And running <code>df['species']</code> will give you: <pre class="prettyprint"><code>array(['setosa', 'versicolor', 'virginica'], dtype=object) </code></pre> See that <code>versicolor</code> pops up in the middle? Thats's why <code>red</code> is no longer assigned to <code>'virginica'</code>, but <code>'versicolor'</code> instead. Suggested solution: So in order to build a complete solution, you'd have to find a way to specify the order of the variables in the source dataframe. Thats very straight forward for a column with unique values. It's a bit more work for a dataframe of a long format such as this. You could do it as described in the post Changing row order in pandas dataframe without losing or messing up data. But below I've put together a very easy function that takes care of both the subset and the order of the dataframe you'd like to plot with plotly express. Using the complete code and switching between the lines under <code># data subsets</code> will give you the three following plots: Plot 1: <code>order=['virginica']</code> <img src="https://i.stack.imgur.com/V1zhN.png" alt="enter image description here"> Plot 2: <code>['virginica', 'setosa']</code> <img src="https://i.stack.imgur.com/g48oI.png" alt="enter image description here"> Plot 3: <code>order=['virginica', 'setosa', 'versicolor']</code> <img src="https://i.stack.imgur.com/JdBuI.png" alt="enter image description here"> Complete code: <pre class="prettyprint"><code># imports import pandas as pd import plotly.express as px # data df = px.data.iris() # function to subset and order a pandas # dataframe fo a long format def order_df(df_input, order_by, order): df_output=pd.DataFrame() for var in order: df_append=df_input[df_input[order_by]==var].copy() df_output = pd.concat([df_output, df_append]) return(df_output) # data subsets df_express = order_df(df_input = df, order_by='species', order=['virginica']) df_express = order_df(df_input = df, order_by='species', order=['virginica', 'setosa']) df_express = order_df(df_input = df, order_by='species', order=['virginica', 'setosa', 'versicolor']) # plotly fig = px.scatter(df_express, x="sepal_width", y="sepal_length", color="species") fig.show() </code></pre>

Plotly-Express: How to fix the color mapping when setting color by column name

Tags:

python

plotly

plotly-express

I am using plotly express for a scatter plot. The color of the markers is defined by a variable of my dataframe, as in the example below.

import pandas as pd
import numpy as np
import plotly.express as px

df = px.data.iris()

fig = px.scatter(df[df.species.isin(['virginica', 'setosa'])], x="sepal_width", y="sepal_length", color="species")
fig.show()

enter image description here

When I add another instance of this variable, the color mapping changes (First, 'virginica', is red, then green).

fig = px.scatter(df, x="sepal_width", y="sepal_length", color="species",size='petal_length', hover_data=['petal_width'])
fig.show()

enter image description here

How can I keep the mapping of the colors when adding variables?

926

asked Jan 13 '20 08:01

otwtm

2 Answers

I found a solution. The function px.scatter has an argument color_discrete_map which is exactly what I needed. color_discrete_map takes a dictionary where the keys are the values of the species and the values are colors assigned to the species.

import plotly.express as px    

df = px.data.iris()
color_discrete_map = {'virginica': 'rgb(255,0,0)', 'setosa': 'rgb(0,255,0)', 'versicolor': 'rgb(0,0,255)'}
fig = px.scatter(df[df.species.isin(['virginica', 'setosa'])], x="sepal_width", y="sepal_length", color="species", color_discrete_map=color_discrete_map)
fig.show()

enter image description here

fig = px.scatter(df, x="sepal_width", y="sepal_length", color="species", color_discrete_map=color_discrete_map)
fig.show()

enter image description here

169

answered Sep 18 '22 18:09

otwtm

Short answer:

1. Assign colors to variables with color_discrete_map :

color_discrete_map = {'virginica': 'blue', 'setosa': 'red', 'versicolor': 'green'}

or:

2. Manage the order of your data to enable the correct color cycle with:

order_df(df_input = df, order_by='species', order=['virginica', 'setosa', 'versicolor'])

... where order_df is a function that handles the ordering of long dataframes for which you'll find the complete definition in the code snippets below.

The details:

1. You can map colors to variables directly with:

color_discrete_map = {'virginica': 'blue', 'setosa': 'red', 'versicolor': 'green'}

The downside is that you'll have to specify variable names and colors. And that quickly becomes tedious if you're working with dataframes where the number of variables is not fixed. In which case it would be much more convenient to follow the default color sequence or specify one to your liking. So I would rather consider managing the order of your dataset so that you'll get the desired colormatching.

2. The source of the real challenge:

px.Scatter() will assign color to variable in the order they appear in your dataframe. Here you're using two different sourcesdf and df[df.species.isin(['virginica', 'setosa', 'versicolor'])] (let's name the latter df2). Running df2['species'].unique() will give you:

array(['setosa', 'virginica'], dtype=object)

And running df['species'] will give you:

array(['setosa', 'versicolor', 'virginica'], dtype=object)

See that versicolor pops up in the middle? Thats's why red is no longer assigned to 'virginica', but 'versicolor' instead.

Suggested solution:

So in order to build a complete solution, you'd have to find a way to specify the order of the variables in the source dataframe. Thats very straight forward for a column with unique values. It's a bit more work for a dataframe of a long format such as this. You could do it as described in the post Changing row order in pandas dataframe without losing or messing up data. But below I've put together a very easy function that takes care of both the subset and the order of the dataframe you'd like to plot with plotly express.

Using the complete code and switching between the lines under # data subsets will give you the three following plots:

Plot 1: order=['virginica']

enter image description here

Plot 2: ['virginica', 'setosa']

enter image description here

Plot 3: order=['virginica', 'setosa', 'versicolor']

enter image description here

Complete code:

# imports
import pandas as pd
import plotly.express as px

# data
df = px.data.iris()

# function to subset and order a pandas
# dataframe fo a long format
def order_df(df_input, order_by, order):
    df_output=pd.DataFrame()
    for var in order:    
        df_append=df_input[df_input[order_by]==var].copy()
        df_output = pd.concat([df_output, df_append])
    return(df_output)

# data subsets
df_express = order_df(df_input = df, order_by='species', order=['virginica'])
df_express = order_df(df_input = df, order_by='species', order=['virginica', 'setosa'])
df_express = order_df(df_input = df, order_by='species', order=['virginica', 'setosa', 'versicolor'])

# plotly
fig = px.scatter(df_express, x="sepal_width", y="sepal_length", color="species")
fig.show()

answered Sep 20 '22 18:09

vestland

Related questions
                            
                                Django REST Framework filter multiple fields
                            
                                Unpivot multiple columns with same name in pandas dataframe
                            
                                How to determine if one list contains another? [duplicate]
                            
                                How to filter the data use equal or greater than condition in the url?
                            
                                create() takes 1 positional argument but 2 were given?
                            
                                How to get size of filtered objectsCollection in boto3
                            
                                Syntax error in ternary if-else statement
                            
                                How to flip image with opencv and python( without cv2.flip)
                            
                                pandas change dtypes only columns of float64
                            
                                What is type <U12?
                            
                                Why do I need both condition branches for the rreverse function?
                            
                                Don't understand this AttributeError: module 'turtle' has no attribute 'Turtle' [duplicate]
                            
                                Tensorflow Python 3.7
                            
                                How to install and use basemap on Google Colab?
                            
                                remove prefix in all column names
                            
                                Generate random timeseries data with dates
                            
                                Extracting particular characters/ text from DataFrame column
                            
                                How to get entire dataset from dataloader in PyTorch
                            
                                How to format Django's timezone.now()
                            
                                ValueError: Field 'id' expected a number but got 'Processing'

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Plotly-Express: How to fix the color mapping when setting color by column name

Tags:

python

plotly

plotly-express

otwtm

People also ask

2 Answers

otwtm

Short answer:

The details:

1. You can map colors to variables directly with:

2. The source of the real challenge:

vestland

Recent Activity

Donate For Us