Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

For geom_violin, how is the total area of all violins specified?

In a call to geom_violin within ggplot2, you can specify that the area of each violin should be proportional to the number of observations making up that violin by specifying scale="count".

I assume this operates internally by taking some total amount of area (let's call this amount X) and dividing it proportionally among all violins to be plotted. This is what I want, except that this can result in pretty narrow violins if there is substantial enough disparity in N between groups such that some groups have relatively low N. In my case, this just makes the fill color kind of hard to see.

I think this can be largely solved, in my case at least, by simply expanding X a little bit so that the really small violins get just enough area to still be readable. In other words, I want to retain variation in area between violins according to the number of observations but increase the "pool" of total area being divided amongst violins, so that every one gets slightly bigger.

Anyone have any idea how one might accomplish this? There's gotta be a toggle for this. I've tried fussing with arguments to geom_violin such as width, size, violinwidth, and such, but no luck so far...

EDIT: Code for a boring but reproducible "sample" data set that one can experiment with.

y = runif(100, 1, 10)
x = as.factor(rep(c(1,2), times=50))
z = as.factor(c(rep(1, 10), rep(2, 90)))
df=data.frame(x, y, z)
ggplot(df, aes(x=x, y=y, fill=z)) + geom_violin(scale="count")
like image 270
Bajcz Avatar asked Nov 08 '22 11:11

Bajcz


1 Answers

You can do this by adjusting width parameter inside geom_violin. But make sure to also use position_dodge to avoid overlapping plots.

Using your data

ggplot(df, aes(x=x, y=y, fill=z)) + geom_violin(scale="count", width=2)

will give the following plot enter image description here

allowing some gap between the plots by using position_dodge

ggplot(df, aes(x=x, y=y, fill=z)) + geom_violin(scale="count", width=2, position=position_dodge(width=0.5))

This will give you the following non-overlapping plot enter image description here

like image 154
rm167 Avatar answered Nov 15 '22 07:11

rm167