I'm looking for a more convenient way to get a Q-Q plot in ggplot2
where the quantiles are computed for the data set as a whole. but I can use mappings (colour/shapes) for groups in the data.
library(dplyr)
library(ggplot2)
library(broom) ## for augment()
Make up some data:
set.seed(1001)
N <- 1000
G <- 10
dd <- data_frame(x=runif(N),
f=factor(sample(1:G,size=N,replace=TRUE)),
y=rnorm(N)+2*x+as.numeric(f))
m1 <- lm(y~x,data=dd)
dda <- cbind(augment(m1),f=dd$f)
Basic plot:
ggplot(dda)+stat_qq(aes(sample=.resid))
if I try to add colour, the groups get separated for the quantile computation (which I don't want):
ggplot(dda)+stat_qq(aes(sample=y,colour=f))
If I use stat_qq(aes(sample=y,colour=f,group=1))
ggplot ignores the colour specification and I get the first plot back.
I want a plot where the points are positioned as in the first case, but coloured as in the second case. I have a qqnorm
-based manual solution that I can post but am looking for something nicer ...
QQ plots can be made in R using a function called qqnorm(). Simply give the vector of data as input and it will draw a QQ plot for you. (qqline() will draw a line through that Q-Q plot to make the linear relationship easier to see.)
This R tutorial describes how to create a qq plot (or quantile-quantile plot) using R software and ggplot2 package. QQ plots is used to check whether a given data follows normal distribution. The function stat_qq() or qplot() can be used.
Examining data distributions using QQ plots Points on the Normal QQ plot provide an indication of univariate normality of the dataset. If the data is normally distributed, the points will fall on the 45-degree reference line. If the data is not normally distributed, the points will deviate from the reference line.
You could calculate the quantiles yourself and then plot using geom_point
:
dda = cbind(dda, setNames(qqnorm(dda$.resid, plot.it=FALSE), c("Theoretical", "Sample")))
ggplot(dda) +
geom_point(aes(x=Theoretical, y=Sample, colour=f))
Ah, I guess I should have read to the end of your question. This is the manual solution you were referring to, right? Although you could just package it as a function:
my_stat_qq = function(data, colour.var) {
data=cbind(data, setNames(qqnorm(data$.resid, plot.it=FALSE), c("Theoretical", "Sample")))
ggplot(data) +
geom_point(aes_string(x="Theoretical", y="Sample", colour=colour.var))
}
my_stat_qq(dda, "f")
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With