I'm looking for a function to compute order statistics from normal distributions. Not a ranking or index but the expectation of a given rank or index, given a distribution and a sample size (ie. expected min, median, max). I am familiar with the analytical solution, but would not be able to solve/approximate the integrals for a normal distribution. Does anyone know of a package that has implement order statistics in R?
You are asking for a package. I don't know of any, but I don't see why you can't "solve/approximate the integrals for a normal distribution" in R. It's actually quite straightforward.
The relevant expression is Equation (1) in this paper:
Where ϕ is the PDF of N[μ, σ] and Φ is the CDF of N[μ, σ]. These functions are built into R as dnorm(...)
and pnorm(...)
respectively.
f <- function(x, mu=0, sigma=1) dnorm(x, mean=mu, sd=sigma)
F <- function(x, mu=0, sigma=1) pnorm(x, mean=mu, sd=sigma, lower.tail=FALSE)
integrand <- function(x,r,n,mu=0, sigma=1) {
x * (1 - F(x, mu, sigma))^(r-1) * F(x, mu, sigma)^(n-r) * f(x, mu, sigma)
}
E <- function(r,n, mu=0, sigma=1) {
(1/beta(r,n-r+1)) * integrate(integrand,-Inf,Inf, r, n, mu, sigma)$value
}
E(1,1000) # expected value of the minimum
# [1] -3.241436
E(1000,1000) # expected value of the maximum
# [1] 3.241436
E(500.5,1000) # expected value of the median
# [1] -6.499977e-18
EDIT Response to OP's comment.
Yes, averaging sample max's from a large number of random draws will approximate E(n,n)
. There are two differences, though. First, the answer will be approximate, whereas the result above is exact (at least to the accuracy to the numerical integration). Second, the first approach runs about 30 times faster.
E.max <- function(n) mean(sapply(1:100,function(i)max(rnorm(n))))
E.max(1000)
# [1] 3.267614
library(microbenchmark)
microbenchmark(E(1000,1000),E.max(1000))
# Unit: milliseconds
# expr min lq median uq max neval
# E(1000, 1000) 1.027536 1.169674 1.333428 1.50429 1.905828 100
# E.max(1000) 23.889773 28.882058 32.642485 37.37952 39.830501 100
Not a direct answer, but you can simply use a computer algebra system to compute the closed form densities rather trivially. Then sample such density t get your estimate for min/max/median.
See here: http://www4.stat.ncsu.edu/~hzhang/st522/08Chapter5_order.pdf
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With