Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Unexpected colors in multiple scatterplots in matplotlib

I'm sure I'm messing up something really simple here, but can't seem to figure it out. I'm simply trying to plot groups of data as scatterplots with different colors for each group by cycling through a dataframe and repeatedly calling ax.scatter. A minimal example is:

import numpy as np; import pandas as pd; import matplotlib.pyplot as plt; import seaborn as sns
%matplotlib inline

df = pd.DataFrame({"Cat":list("AAABBBCCC"), "x":np.random.rand(9), "y":np.random.rand(9)})

fig, ax = plt.subplots()
for i,cat in enumerate(df.Cat.unique()):
    print i, cat, sns.color_palette("husl",3)[i]
    ax.scatter(df[df.Cat==cat].x.values, df[df.Cat==cat].y.values, marker="h",s=70,
               label = cat, color=sns.color_palette("husl",3)[i])
ax.legend(loc=2)

I added the print statement for my own sanity to confirm that I am indeed cycling through the groups and choosing different colors. The output however looks as follows:

enter image description here

(If this is slightly hard to see: the groups A, B, and C have three very similar blues according to the legend, however all scatterpoints have different and seemingly unrelated colors, which aren't even identical across groups)

What is going on here?

like image 222
Nils Gudat Avatar asked Jun 07 '26 12:06

Nils Gudat


2 Answers

You could use scatter() method of pandas by specifying the target axand repeating the plots to plot multiple column groups in a single axes,ax.

# set random seed
np.random.seed(42)                       

fig, ax = plt.subplots()
for i,label in enumerate(df['Cat'].unique()):
    # select subset of columns equal to a given label
    df['X'] = df[df['Cat']==label]['x']       
    df['Y'] = df[df['Cat']==label]['y']
    df.plot.scatter(x='X',y='Y',color=sns.color_palette("husl",3)[i],label=label,ax=ax)
ax.legend(loc=2)

enter image description here

like image 198
Nickil Maveli Avatar answered Jun 10 '26 03:06

Nickil Maveli


Should have spent a bit more time whittling down the minimum working example. Turns out the problem was with the call to sns.color_palette, which returns a (float,float,float) tuple that confuses scatter as it apparently interprets one of the numbers as the alpha value.

The problem is solved by replacing

color = sns.color_palette("husl",3)[i]

with

color = sns.color_palette("husl",3)[i] + (1.,)

to add an explicit value for alpha.

like image 26
Nils Gudat Avatar answered Jun 10 '26 02:06

Nils Gudat



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!