Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

The `hue` parameter in Seaborn.relplot() skips an integer when given numerical data?

The hue parameter skips one integer.

d = {'column1':[1,2,3,4,5], 'column2':[2,4,5,2,3], 'cluster':[0,1,2,3,4]}

df = pd.DataFrame(data=d)

sns.relplot(x='column2', y='column1', hue='cluster', data=df)

While all points are plotted, the cluster label is missing '2'.

Python 2.7 Seaborn 0.9.0 Ubuntu 16.04 LTS

like image 384
Bstampe Avatar asked Jul 25 '18 18:07

Bstampe


People also ask

What is hue parameter in Seaborn?

In seaborn, the hue parameter represents which column in the data frame, you want to use for color encoding.

What is Relplot in Seaborn?

The Seaborn Relational Plot (relplot) allows us to visualise how variables within a dataset relate to each other. Data visualisation is an essential part of any data analysis or machine learning workflow. It allows to gain insights about our data.

What is a Relplot?

The one we will use most is relplot() . This is a figure-level function for visualizing statistical relationships using two common approaches: scatter plots and line plots. relplot() combines a FacetGrid with one of two axes-level functions: scatterplot() (with kind="scatter" ; the default)


1 Answers

"Full" legend

If the hue is in numeric format, seaborn will assume that it represents some continuous quantity and will decide to display what it thinks is a representative sample along the color dimension.

You can circumvent this by using legend="full".

import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd

df = pd.DataFrame({'column1':[1,2,3,4,5], 'column2':[2,4,5,2,3], 'cluster':[0,1,2,3,4]})
sns.relplot(x='column2', y='column1', hue='cluster', data=df, legend="full")
plt.show()

enter image description here

Categoricals

An alternative is to make sure the values are treated categorical Unfortunately, even if you plug in the numbers as strings, they will be converted to numbers falling back to the same mechanism described above. This may be seen as a bug.

However, one choice you have is to use real categories, like e.g. single letters.

'cluster':list("ABCDE")

works fine,

import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd

d = {'column1':[1,2,3,4,5], 'column2':[2,4,5,2,3], 'cluster':list("ABCDE")}

df = pd.DataFrame(data=d)

sns.relplot(x='column2', y='column1', hue='cluster', data=df)

plt.show()

enter image description here

Strings with customized palette

An alternative to the above is to use numbers converted to strings, and then make sure to use a custom palette with as many colors as there are unique hues.

import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd

d = {'column1':[1,2,3,4,5], 'column2':[2,4,5,2,3], 'cluster':[1,2,3,4,5]}

df = pd.DataFrame(data=d)
df["cluster"] = df["cluster"].astype(str)

sns.relplot(x='column2', y='column1', hue='cluster', data=df, 
            palette=["b", "g", "r", "indigo", "k"])

plt.show()

enter image description here

like image 141
ImportanceOfBeingErnest Avatar answered Oct 12 '22 15:10

ImportanceOfBeingErnest