Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Order statistics in R?

Tags:

r

max

min

median

I'm looking for a function to compute order statistics from normal distributions. Not a ranking or index but the expectation of a given rank or index, given a distribution and a sample size (ie. expected min, median, max). I am familiar with the analytical solution, but would not be able to solve/approximate the integrals for a normal distribution. Does anyone know of a package that has implement order statistics in R?

like image 612
user3738669 Avatar asked Oct 01 '22 13:10

user3738669


2 Answers

You are asking for a package. I don't know of any, but I don't see why you can't "solve/approximate the integrals for a normal distribution" in R. It's actually quite straightforward.

The relevant expression is Equation (1) in this paper:

Where ϕ is the PDF of N[μ, σ] and Φ is the CDF of N[μ, σ]. These functions are built into R as dnorm(...) and pnorm(...) respectively.

f <- function(x, mu=0, sigma=1) dnorm(x, mean=mu, sd=sigma)
F <- function(x, mu=0, sigma=1) pnorm(x, mean=mu, sd=sigma, lower.tail=FALSE)

integrand <- function(x,r,n,mu=0, sigma=1) {
  x * (1 - F(x, mu, sigma))^(r-1) * F(x, mu, sigma)^(n-r) * f(x, mu, sigma)
}

E <- function(r,n, mu=0, sigma=1) {
  (1/beta(r,n-r+1)) * integrate(integrand,-Inf,Inf, r, n, mu, sigma)$value
}

E(1,1000)       # expected value of the minimum
# [1] -3.241436
E(1000,1000)    # expected value of the maximum
# [1] 3.241436
E(500.5,1000)   # expected value of the median
# [1] -6.499977e-18

EDIT Response to OP's comment.

Yes, averaging sample max's from a large number of random draws will approximate E(n,n). There are two differences, though. First, the answer will be approximate, whereas the result above is exact (at least to the accuracy to the numerical integration). Second, the first approach runs about 30 times faster.

E.max <- function(n) mean(sapply(1:100,function(i)max(rnorm(n))))
E.max(1000)
# [1] 3.267614

library(microbenchmark)
microbenchmark(E(1000,1000),E.max(1000))
# Unit: milliseconds
#           expr       min        lq    median       uq       max neval
#  E(1000, 1000)  1.027536  1.169674  1.333428  1.50429  1.905828   100
#    E.max(1000) 23.889773 28.882058 32.642485 37.37952 39.830501   100
like image 149
jlhoward Avatar answered Oct 03 '22 15:10

jlhoward


Not a direct answer, but you can simply use a computer algebra system to compute the closed form densities rather trivially. Then sample such density t get your estimate for min/max/median.

See here: http://www4.stat.ncsu.edu/~hzhang/st522/08Chapter5_order.pdf

like image 45
Vlo Avatar answered Oct 03 '22 16:10

Vlo