Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Restrain scattered jitter points within a violin plot by ggplot2

A following is used to generate the violin plot in ggplot2 :

ggplot(violin,aes(x=variable,y=log(value+0.5),color=Group)) + 
  geom_violin(scale="width") + 
  geom_jitter(aes(group=Group), position=position_jitterdodge()) + 
  stat_summary(fun.y="mean",geom="crossbar", mapping=aes(ymin=..y.., ymax=..y..), 
     width=1, position=position_dodge(),show.legend = FALSE) + 
  theme(axis.text.x = element_text(angle = 45, margin=margin(0.5, unit="cm")))

A resulting plot looks like following;

enter image description here

As you can see, some points are jittered outside the boundary of violin shape and I need to those points to be inside of the violin. I've played different levels of jittering but have had any success. I'd appreciate any pointers to achieve this.

like image 554
akh22 Avatar asked Jun 27 '18 19:06

akh22


People also ask

What does jitter do in Ggplot?

The jitter geom is a convenient shortcut for geom_point(position = "jitter") . It adds a small amount of random variation to the location of each point, and is a useful way of handling overplotting caused by discreteness in smaller datasets.

How do you make a Violinplot with data points in R?

If you want to create a violin plot with dots in ggplot2 you can use geom_dotplot , setting binaxis = "y" and stackdir = "center" . Note that dotsize controls the size of the points. The default dots are of the same color of the groups. To override this you can specify a fill color inside geom_dotplot .

What Geom might I use to add points to the violin plot?

The function stat_summary() can be used to add mean/median points and more on a violin plot.

What is jitter plot in R?

A jitter plot represents data points in the form of single dots, in a similar manner to a scatter plot. The difference is that the jitter plot helps visualize the relationship between a measurement variable and a categorical variable.


3 Answers

The package ggbeeswarm has the geoms quasirandom and beeswarm, which do exactly what you are searching for: https://github.com/eclarke/ggbeeswarm

like image 92
Jannik Buhr Avatar answered Nov 10 '22 05:11

Jannik Buhr


It is a little bit old question but I think there is a better solution.

As @Richard Telford pointed out in a comment, geom_sina is the best solution IMO.

simulate data

df <- data.frame(data=rnorm(1200), 
                 group=rep(c("A","A","A", "B","B","C"),
                           200)
                 )

make plot

ggplot(df, aes(y=data,x=group,color=group)) +
  geom_violin()+
  geom_sina()

result

enter image description here

Hope this is helpful.

like image 44
Alfonso Avatar answered Nov 10 '22 04:11

Alfonso


Option 1

Using the function geom_quasirandom from package geom_beeswarm:

The quasirandom geom is a convenient means to offset points within categories to reduce overplotting. Uses the vipor package.

library(ggbeeswarm)
p <- ggplot(mpg, aes(class, hwy))
p + geom_violin(width = 1.3) + geom_quasirandom(alpha = 0.2, width = 0.2)

enter image description here

Option 2

Not a satisfactory answer, because by restricting the horizontal jitter we defeat the purpose of handling overplotting. But you can enlarge the width of the violin plots (width = 1.3), and play with alpha for transparency and limit the horizontal jitter (width = .02).

p <- ggplot(mpg, aes(class, hwy))
p + geom_violin(width = 1.3) + geom_jitter(alpha = 0.2, width = .02)

enter image description here

like image 40
mpalanco Avatar answered Nov 10 '22 05:11

mpalanco