Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

stat_qq removes values when setting group

Tags:

r

ggplot2

I am trying to make a QQ-plot in ggplot2, where a select few of the points should have a different shape. But when I map the shape to a variable in the aesthetics, stat_qq includes this variable to split the data (there are 2x3 factors involved).

Here is a reproducible example:

library(ggplot2)
set.seed(331)

df <- do.call(rbind, replicate(10, {expand.grid(method=factor(letters[1:3]), model=factor(LETTERS[1:2]))}, simplify=FALSE ))
df$x <- runif(nrow(df))
df$y <- rnorm(nrow(df), sd=0.2) + 1*as.integer(df$method)
df$top <- FALSE
df <- df[order(df$y, decreasing=TRUE),]
df$top[which(df$method=='a')[1:10]] <- TRUE

So far, I have managed to make a simple QQ-plot:

ggplot(df, aes(sample=y, colour=method)) + stat_qq() + facet_grid(.~model)

ggplot(df, aes(sample=y, colour=method)) + stat_qq() + facet_grid(.~model)

This is basically what I want, except for a hand full of the points in method 'a' having a different shape, as indicated by the variable 'top'. From the code, we know that these corresponds to the top 5 values in method 'a' in each model; i.e. that the five left most of the red dots in each facet should have a different shape. Here I have attempted to add it as an aesthetics:

ggplot(df, aes(sample=y, colour=method, shape=top)) + stat_qq() + facet_grid(.~model)

ggplot(df, aes(sample=y, colour=method, shape=top)) + stat_qq() + facet_grid(.~model)

Now, it is quite clear, that stat_qq has included the variable 'top' to split the data set, as the top 5 data points are plotted parallel to the the non-top points. This is not as intended.

How can I instruct stat_qq how to group the data? I could try the group-aesthetic:

ggplot(df, aes(sample=y, colour=method, shape=top, group=method)) + stat_qq() + facet_grid(.~model)
Warning messages:
1: Removed 10 rows containing missing values (geom_point). 
2: Removed 10 rows containing missing values (geom_point). 

ggplot(df, aes(sample=y, colour=method, shape=top, group=method)) + stat_qq() + facet_grid(.~model)

But for some reason, this entirely removes all data points connected to the model.

Any ideas how to overcome this?

like image 274
MrGumble Avatar asked Jun 08 '26 12:06

MrGumble


1 Answers

Since you want to violate one of the fundamental concepts of ggplot2 it would be easier to do the calculations outside of ggplot:

library(plyr)
df <- ddply(df, .(model, method), 
            transform, theo=qqnorm(y, plot.it=FALSE)[["x"]])

ggplot(df, aes(x=theo, y=y, colour=method, shape=top)) + 
    geom_point() + facet_grid(.~model)

enter image description here

like image 169
Roland Avatar answered Jun 10 '26 04:06

Roland



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!