Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Visualizing the Kolmogorov-Smirnov statistic in ggplot2

The Kolmogorov-Smirnov statistic is defined as the maximum distance between the empirical and the hypothesized cumulative distribution function. Rather than looking at numbers, I think it is much preferable to locate the maximum difference using a graph.

I know how to plot the empirical distribution function

p1<-qplot(rnorm(30),stat="ecdf",geom="step")

but could you please tell me how I could add on the same plot the cumulative distribution function of the theoretical distribution? For my case, the theoretical distribution is the standard normal but I am interested in the generalization to every distribution function.

Thank you.

like image 339
JohnK Avatar asked Dec 03 '14 22:12

JohnK


1 Answers

If you want to use ggplot, just do

set.seed(15)
dd <- data.frame(x=rnorm(30))
ggplot(dd, aes(x)) +
    stat_ecdf() + 
    stat_function(fun = pnorm, colour = "red")

You can find the maximal distance if you like with

ed <- ecdf(dd$x)
maxdiffidx <- which.max(abs(ed(dd$x)-pnorm(dd$x)))
maxdiffat <- dd$x[maxdiffidx]

and add that to the plot with

ggplot(dd, aes(x)) +
    stat_ecdf() + 
    stat_function(fun = pnorm, colour = "red") + 
    geom_vline(x=maxdiffat, lty=2)

enter image description here

like image 136
MrFlick Avatar answered Sep 29 '22 00:09

MrFlick