Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Finding mean of standard normal distribution in a given interval

I want to find mean of standard normal distribution in a given interval.

For example, if I divide standard normal distribution into two ([-Inf:0] [0:Inf]) I want to get the mean of each half.

Following code does almost exactly what I want:

divide <- 2
boundaries <- qnorm(seq(0,1,length.out=divide+1))
t <- sort(rnorm(100000))
means.1 <- rep(NA,divide)
for (i in 1:divide) {
    means.1[i] <- mean(t[(t>boundaries[i])&(t<boundaries[i+1])])
  }    

But I need a more precise (and elegant) method to calculate these numbers (means.1).

I tried the following code but it did not work (maybe because of the lack of my probability knowledge).

divide <- 2
boundaries <- qnorm(seq(0,1,length.out=divide+1))
means.2 <- rep(NA,divide)
f <- function(x) {x*dnorm(x)}
for (i in 1:divide) {
  means.2[i] <- integrate(f,lower=boundaries[i],upper=boundaries[i+1])$value
}    

Any ideas? Thanks in advance.

like image 794
HBat Avatar asked Apr 12 '13 17:04

HBat


2 Answers

The problem is that the integral of dnorm(x) in the interval (-Inf to 0) isn't 1, that's why you got the wrong answer. To correct you must divide the result you got by 0.5 (the integral result). Like:

func <- function(x, ...) x * dnorm(x, ...)
integrate(func, -Inf, 0, mean=0, sd=1)$value / (pnorm(0, mean=0, sd=1) - pnorm(-Inf, mean=0, sd=1)) 

Adapt it to differents intervals should be easy.

like image 61
Rcoster Avatar answered Oct 06 '22 01:10

Rcoster


Thanks for answering my question.

I combined all answers as I understand:

    divide <- 5
    boundaries <- qnorm(seq(0,1,length.out=divide+1))
# My original thinking        
    t <- sort(rnorm(1e6))
    means.1 <- rep(NA,divide)
    for (i in 1:divide) {
        means.1[i] <- mean(t[((t>boundaries[i])&(t<boundaries[i+1]))])
      }    

# Based on @DWin
    t <- sort(rnorm(1e6))
    means.2 <- tapply(t, findInterval(t, boundaries), mean)

# Based on @Rcoster
    means.3 <- rep(NA,divide)
    f <- function(x, ...) x * dnorm(x, ...)
    for (i in 1:divide) {
      means.3[i] <- integrate(f, boundaries[i], boundaries[i+1])$value / (pnorm(boundaries[i+1]) - pnorm(boundaries[i]))
    }   

# Based on @Kith
    t <- sort(rnorm(1e6))
    means.4 <- rep(NA,divide)    
    for (i in 1:divide) {
      means.4[i] <- fitdistr(t[t > boundaries[i] & t < boundaries[i+1]], densfun="normal")$estimate[1]
    }    

Results

>   means.1
[1] -1.4004895486 -0.5323784986 -0.0002590746  0.5313539906  1.3978177100
>   means.2   
[1] -1.3993590768 -0.5329465789 -0.0002875593  0.5321381745  1.3990997391 
>   means.3
[1] -1.399810e+00 -5.319031e-01  1.389222e-16  5.319031e-01  1.399810e+00
>   means.4
[1] -1.399057073 -0.531946615 -0.000250952  0.531615180  1.400086731

I believe @Rcoster is the one that I wanted. Rest is innovative approaches compared to mine but still approximate. Thanks.

like image 31
HBat Avatar answered Oct 06 '22 00:10

HBat