I would like to make a scatter plot in seaborn/matplotlib where the size of points is determined by a (continuous) value in a dataframe, and the color of points is also determined by the continuous value of another column in dataframe. In ggplot, the way to do it is:
ggplot(iris) + geom_point(aes(x=Sepal.Width, y=Sepal.Length, size=Petal.Width, color=Petal.Length))
(color/size here are continuous not categorical values)
what's the syntax for this in seaborn/matplotlib?
To set the size of markers, we can use the s parameter. This parameter can be used since seaborn is built on the matplotlib module. We can specify this argument in the scatterplot() function and set it to some value. Alternatively, we can control the size of the points based on some variables.
Scatterplot with Seaborn Default Colors In addition to these arguments we can use hue and specify we want to color the data points based on another grouping variable. This will produce points with different colors. g =sns. scatterplot(x="gdpPercap", y="lifeExp", hue="continent", data=gapminder); g.
In Seaborn's scatterplot() function, we can change the shape of markers by a variable using style argument. In this example, we have changed the marker's shape based on the value of the variable, “sex” in the dataframe. Notice that data points corresponding to males are different from females.
The following reproduces the code diagram from the question. Optaining a legend is a bit cumbersome, because we have to manually define some proxy artists to put to the legend and remove the first automatic legend entry which is generated via the seaborn style.
import seaborn as sns
import matplotlib.pyplot as plt
iris = sns.load_dataset("iris")
plt.scatter(iris.sepal_width, iris.sepal_length,
c = iris.petal_length, s=(iris.petal_width**2)*60, cmap="viridis")
ax = plt.gca()
plt.colorbar(label="petal_length")
plt.xlabel("sepal_width")
plt.ylabel("sepal_length")
#make a legend:
pws = [0.5, 1, 1.5, 2., 2.5]
for pw in pws:
plt.scatter([], [], s=(pw**2)*60, c="k",label=str(pw))
h, l = plt.gca().get_legend_handles_labels()
plt.legend(h[1:], l[1:], labelspacing=1.2, title="petal_width", borderpad=1,
frameon=True, framealpha=0.6, edgecolor="k", facecolor="w")
plt.show()
Note that the size argument s
denotes the area of the dots. So in order to have the diameter be proportional to the quantitiy to show, it has to to be squared.
Here is how I would solve this using Altair
from altair import Chart
import seaborn as sns
iris = sns.load_dataset("iris")
c = Chart(iris)
c.mark_circle().encode(
x='sepal_width',
y='sepal_length',
color='petal_length',
size='petal_width',
)
For such plots, Altair might be a great choice. Quoting from their website:
Altair is a declarative statistical visualization library for Python, based on Vega-Lite. With Altair, you can spend more time understanding your data and its meaning.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With