I started to really like violin plots, since they give me a much better feel that box plots when you have funny distributions. I like to automatize a lot of stuff, and thus ran into a problem: When one variable has 0 variance, the boxplot just gives you a line at that point. Geom_violin however, terminates with an error. What behavior would I like? Well, either put in a line or nothing, but please give me the distributions for the other variables.
Ok, quick example:
dff=data.frame(x=factor(rep(1:2,each=100)),y=c(rnorm(100),rep(0,100)))
ggplot(dff,aes(x=x,y=y)) + geom_violin()
yields
Error in `$<-.data.frame`(`*tmp*`, "n", value = 100L) :
replacement has 1 row, data has 0
However, what works is:
ggplot(dff,aes(x=x,y=y)) + geom_boxplot()
Update:
The issue is resolved as of yesterday: https://github.com/hadley/ggplot2/issues/972
Update 2:
(from question author)
Wow, Hadley himself responded! geom_violin
now behaves consistently with geom_density
and base R density
.
However, I don't think the behavior is optimal yet.
(1) The 'zero' problem
Just run it with my original example:
dff=data.frame(x=factor(rep(1:2, each=100)), y=c(rnorm(100), rep(0,100)))
ggplot(dff,aes(x=x,y=y)) + geom_violin(trim=FALSE)
Yielding this:
Is the plot on the right an appropriate representation of 'all zeroes'? I don't think so. It is better to have trimming that produces a single line to show that there is no variation in the data.
Workaround solution: Add a + geom_boxplot()
(2) I may actually want TRIM=TRUE
.
Example:
dff=data.frame(x=factor(rep(1:2, each=100)), y=c(rgamma(100,1,1), rep(0,100) ))
ggplot(dff,aes(x=x,y=y)) + geom_violin(trim=FALSE)
Now I have non-zero data, and standard kernel density estimates don't handle this correctly. With trim=T
I can quickly see that the data is strictly positive.
I am not arguing that the current behavior is 'wrong', since it's in line with other functions. However, geom_violin
may be used in different contexts, for exploring different data.frames with heterogeneous data types (positive+skewed or not, for instance).
Three options for dealing with this until the ggplot2
issue is resolved:
geom_violin
will work. vioplot
package if you're not set on using ggplot2
. vioplot
doesn't throw an error when you feed it a bunch of identical values. Hmisc
package includes a panel.bpplot
(box-percentile plot) function that can create violin plots with the bwplot
function from the lattice
package. See the Examples section of ?panel.bpplot
. It produces a single line when you feed it a vector of identical values. If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With