Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

adding text to ggplot geom_jitter points that match a condition

Tags:

r

ggplot2

plyr

How can I add text to points rendered with geom_jittered to label them? geom_text will not work because I don't know the coordinates of the jittered dots. Could you capture the position of the jittered points so I can pass to geom_text?

My practical usage would be to plot a boxplot with the geom_jitter over it to show the data distribution and I would like to label the outliers dots or the ones that match certain condition (for example the lower 10% for the values used for color the plots).

One solution would be to capture the xy positions of the jittered plots and use it later in another layer, is that possible?

[update]

From Joran answer, a solution would be to calculate the jittered values with the jitter function from the base package, add them to a data frame and use them with geom_point. For filtering he used ddply to have a filter column (a logic vector) and use it for subsetting the data in geom_text.

He asked for a minimal dataset. I just modified his example (a unique identifier in the label colum)

dat <- data.frame(x=rep(letters[1:3],times=100),y=runif(300),
                      lab=paste('id_',1:300,sep='')) 

This is the result of joran example with my data and lowering the display of ids to the lowest 1% boxplot with jitter dots and label in lower 1% values

And this is a modification of the code to have colors by another variable and displaying some values of this variable (the lowest 1% for each group):

library("ggplot2")
#Create some example data
dat <- data.frame(x=rep(letters[1:3],times=100),y=runif(300),
                          lab=paste('id_',1:300,sep=''),quality= rnorm(300))

#Create a copy of the data and a jittered version of the x variable
datJit <- dat
datJit$xj <- jitter(as.numeric(factor(dat$x)))

#Create an indicator variable that picks out those
# obs that are in lowest 1% by x
datJit <- ddply(datJit,.(x),.fun=function(g){
               g$grp <- g$y <= quantile(g$y,0.01);
               g$top_q <- g$qual <= quantile(g$qual,0.01);
               g})

#Create a boxplot, overlay the jittered points and
# label the bottom 1% points
ggplot(dat,aes(x=x,y=y)) +
  geom_boxplot() +
  geom_point(data=datJit,aes(x=xj,colour=quality)) +
  geom_text(data=subset(datJit,grp),aes(x=xj,label=lab)) +
  geom_text(data=subset(datJit,top_q),aes(x=xj,label=sprintf("%0.2f",quality)))

boxplot with jitter dots and label in lower 1% values

like image 928
Pablo Marin-Garcia Avatar asked Jul 01 '11 17:07

Pablo Marin-Garcia


1 Answers

Your question isn't completely clear; for example, you mention labeling points at one point but also mention coloring points, so I'm not sure which you really mean, or perhaps both. A reproducible example would be very helpful. But using a little guesswork on my part, the following code does what I think you're describing:

#Create some example data
dat <- data.frame(x=rep(letters[1:3],times=100),y=runif(300),
        lab=rep('label',300))

#Create a copy of the data and a jittered version of the x variable
datJit <- dat
datJit$xj <- jitter(as.numeric(factor(dat$x)))

#Create an indicator variable that picks out those 
# obs that are in lowest 10% by x
datJit <- ddply(datJit,.(x),.fun=function(g){
             g$grp <- g$y <= quantile(g$y,0.1); g})

#Create a boxplot, overlay the jittered points and 
# label the bottom 10% points
ggplot(dat,aes(x=x,y=y)) + 
    geom_boxplot() + 
    geom_point(data=datJit,aes(x=xj)) + 
    geom_text(data=subset(datJit,grp),aes(x=xj,label=lab))        
like image 163
joran Avatar answered Sep 25 '22 01:09

joran