Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Q-Q plot with ggplot2::stat_qq, colours, single group

Tags:

plot

r

ggplot2

I'm looking for a more convenient way to get a Q-Q plot in ggplot2 where the quantiles are computed for the data set as a whole. but I can use mappings (colour/shapes) for groups in the data.

library(dplyr)
library(ggplot2)
library(broom) ## for augment()

Make up some data:

set.seed(1001)
N <- 1000
G <- 10
dd <- data_frame(x=runif(N),
             f=factor(sample(1:G,size=N,replace=TRUE)),
             y=rnorm(N)+2*x+as.numeric(f))
m1 <- lm(y~x,data=dd)
dda <- cbind(augment(m1),f=dd$f)

Basic plot:

ggplot(dda)+stat_qq(aes(sample=.resid))

enter image description here

if I try to add colour, the groups get separated for the quantile computation (which I don't want):

ggplot(dda)+stat_qq(aes(sample=y,colour=f))

enter image description here

If I use stat_qq(aes(sample=y,colour=f,group=1)) ggplot ignores the colour specification and I get the first plot back.

I want a plot where the points are positioned as in the first case, but coloured as in the second case. I have a qqnorm-based manual solution that I can post but am looking for something nicer ...

like image 565
Ben Bolker Avatar asked Mar 08 '17 18:03

Ben Bolker


People also ask

How do you make a QQ plot in a data set in R?

QQ plots can be made in R using a function called qqnorm(). Simply give the vector of data as input and it will draw a QQ plot for you. (qqline() will draw a line through that Q-Q plot to make the linear relationship easier to see.)

What R package is Qqplot in?

This R tutorial describes how to create a qq plot (or quantile-quantile plot) using R software and ggplot2 package. QQ plots is used to check whether a given data follows normal distribution. The function stat_qq() or qplot() can be used.

How do you analyze a QQ plot?

Examining data distributions using QQ plots Points on the Normal QQ plot provide an indication of univariate normality of the dataset. If the data is normally distributed, the points will fall on the 45-degree reference line. If the data is not normally distributed, the points will deviate from the reference line.


1 Answers

You could calculate the quantiles yourself and then plot using geom_point:

dda = cbind(dda, setNames(qqnorm(dda$.resid, plot.it=FALSE), c("Theoretical", "Sample")))

ggplot(dda) + 
  geom_point(aes(x=Theoretical, y=Sample, colour=f))

enter image description here

Ah, I guess I should have read to the end of your question. This is the manual solution you were referring to, right? Although you could just package it as a function:

my_stat_qq = function(data, colour.var) {

  data=cbind(data, setNames(qqnorm(data$.resid, plot.it=FALSE), c("Theoretical", "Sample")))

  ggplot(data) + 
    geom_point(aes_string(x="Theoretical", y="Sample", colour=colour.var))

}

my_stat_qq(dda, "f")
like image 141
eipi10 Avatar answered Sep 26 '22 04:09

eipi10