Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

scatter plots in seaborn/matplotlib with point size and color given by continuous dataframe column

I would like to make a scatter plot in seaborn/matplotlib where the size of points is determined by a (continuous) value in a dataframe, and the color of points is also determined by the continuous value of another column in dataframe. In ggplot, the way to do it is:

ggplot(iris) + geom_point(aes(x=Sepal.Width, y=Sepal.Length, size=Petal.Width, color=Petal.Length))

enter image description here (color/size here are continuous not categorical values)

what's the syntax for this in seaborn/matplotlib?

like image 798
jll Avatar asked Mar 12 '17 23:03

jll


People also ask

How do I change the marker size in scatter plot in Seaborn?

To set the size of markers, we can use the s parameter. This parameter can be used since seaborn is built on the matplotlib module. We can specify this argument in the scatterplot() function and set it to some value. Alternatively, we can control the size of the points based on some variables.

How do you change the color of a scatterplot in Seaborn?

Scatterplot with Seaborn Default Colors In addition to these arguments we can use hue and specify we want to color the data points based on another grouping variable. This will produce points with different colors. g =sns. scatterplot(x="gdpPercap", y="lifeExp", hue="continent", data=gapminder); g.

How do I change the shape of scatter points in Seaborn?

In Seaborn's scatterplot() function, we can change the shape of markers by a variable using style argument. In this example, we have changed the marker's shape based on the value of the variable, “sex” in the dataframe. Notice that data points corresponding to males are different from females.


2 Answers

The following reproduces the code diagram from the question. Optaining a legend is a bit cumbersome, because we have to manually define some proxy artists to put to the legend and remove the first automatic legend entry which is generated via the seaborn style.

enter image description here

import seaborn as sns
import matplotlib.pyplot as plt
iris = sns.load_dataset("iris")

plt.scatter(iris.sepal_width, iris.sepal_length, 
            c = iris.petal_length, s=(iris.petal_width**2)*60, cmap="viridis")
ax = plt.gca()

plt.colorbar(label="petal_length")
plt.xlabel("sepal_width")
plt.ylabel("sepal_length")

#make a legend:
pws = [0.5, 1, 1.5, 2., 2.5]
for pw in pws:
    plt.scatter([], [], s=(pw**2)*60, c="k",label=str(pw))

h, l = plt.gca().get_legend_handles_labels()
plt.legend(h[1:], l[1:], labelspacing=1.2, title="petal_width", borderpad=1, 
            frameon=True, framealpha=0.6, edgecolor="k", facecolor="w")

plt.show()

Note that the size argument s denotes the area of the dots. So in order to have the diameter be proportional to the quantitiy to show, it has to to be squared.

like image 63
ImportanceOfBeingErnest Avatar answered Oct 01 '22 04:10

ImportanceOfBeingErnest


Here is how I would solve this using Altair

from altair import Chart
import seaborn as sns
iris = sns.load_dataset("iris")
c = Chart(iris)
c.mark_circle().encode(
    x='sepal_width',
    y='sepal_length',
    color='petal_length',
    size='petal_width',
)

enter image description here

For such plots, Altair might be a great choice. Quoting from their website:

Altair is a declarative statistical visualization library for Python, based on Vega-Lite. With Altair, you can spend more time understanding your data and its meaning.

like image 34
Nipun Batra Avatar answered Oct 01 '22 02:10

Nipun Batra