Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Estimating the Standard Deviation of a ratio using Taylor expansion

I am interested to build a R function that I can use to test the limits of the Taylor series approximation. I am aware that there is limits to what I am doing, but it's exactly those limits I wish to investigate.

I have two normally distributed random variables x and y. x has a mean of 7 and a standard deviation (sd) of 1. y has a mean of 5 and a sd of 4.

me.x <- 4; sd.x <- 1
me.y <- 5; sd.y <- 4

I know how to estimate the mean ratio of y/x, like this

# E(y/x) = E(y)/E(x) - Cov(y,x)/E(x)^2 + Var(x)*E(y)/E(x)^3
me.y/me.x - 0/me.x^2 + sd.x*me.y/me.x^3
[1] 1.328125

I am however stuck on how to estimate the Standard Deviation of the ratio? I realize I have to use a Taylor expansion, but not how to use it.

Doing a simple simulation I get

 x <- rnorm(10^4, mean = 4, sd = 1);  y <- rnorm(10^4, mean = 5, sd = 4)
 sd(y/x)
 [1] 2.027593
 mean(y/x)[1]
 1.362142
like image 719
Eric Fail Avatar asked Dec 24 '22 09:12

Eric Fail


2 Answers

There is an analytical expression for the PDF of the ratio of two gaussians, done by David Hinkley (e.g. see Wikipedia). So we could compute all momentums, means etc. I typed it and apparently it clearly doesn't have finite second momentum, thus it doesn't have finite standard deviation. Note, I've denoted your Y gaussian as my X, and your X as my Y (formulas assume X/Y). I've got mean value of ratio pretty close to the what you've got from simulation, but last integral is infinite, sorry. You could sample more and more values, but from sampling std.dev is growing as well, as noted by @G.Grothendieck

library(ggplot2)

m.x <- 5; s.x <- 4
m.y <- 4; s.y <- 1

a <- function(x) {
    sqrt( (x/s.x)^2 + (1.0/s.y)^2 )
}

b <- function(x) {
    (m.x*x)/s.x^2 + m.y/s.y^2
}

c <- (m.x/s.x)^2 + (m.y/s.y)^2

d <- function(x) {
    u <- b(x)^2 - c*a(x)^2
    l <- 2.0*a(x)^2
    exp( u / l )
}

# PDF for the ratio of the two different gaussians
PDF <- function(x) {
    r <- b(x)/a(x)
    q <- pnorm(r) - pnorm(-r)

    (r*d(x)/a(x)^2) * (1.0/(sqrt(2.0*pi)*s.x*s.y)) * q + exp(-0.5*c)/(pi*s.x*s.y*a(x)^2)
}

# normalization
nn <- integrate(PDF, -Inf, Inf)
nn <- nn[["value"]]

# plot PDF
p <- ggplot(data = data.frame(x = 0), mapping = aes(x = x))
p <- p + stat_function(fun = function(x) PDF(x)/nn) + xlim(-2.0, 6.0)
print(p)

# first momentum
m1 <- integrate(function(x) x*PDF(x), -Inf, Inf)
m1 <- m1[["value"]]

# mean
print(m1/nn)

# some sampling
set.seed(32345)
n <- 10^7L
x <- rnorm(n, mean = m.x, sd = s.x); y <- rnorm(n, mean = m.y, sd = s.y)
print(mean(x/y))
print(sd(x/y))

# second momentum - Infinite!
m2 <- integrate(function(x) x*x*PDF(x), -Inf, Inf)

Thus, it is impossible to test any Taylor expansion for std.dev.

like image 103
Severin Pappadeux Avatar answered Jan 31 '23 00:01

Severin Pappadeux


With the cautions suggested by @G.Grothendieck in mind: a useful mnemonic for products and quotients of independent X and Y variables is

CV^2(X/Y) = CV^2(X*Y) = CV^2(X) + CV^2(Y)

where CV is the coefficient of variation (sd(X)/mean(X)), so CV^2 is Var/mean^2. In other words

Var(Y/X)/(m(Y/X))^2 = Var(X)/m(X)^2 + Var(Y)/m(Y)^2

or rearranging

sd(Y/X) = sqrt[ Var(X)*m(Y/X)^2/m(X)^2 + Var(Y)*m(Y/X)^2/m(Y)^2 ]

For random variables with the mean well away from zero, this is a reasonable approximation.

set.seed(101)
y <- rnorm(1000,mean=5)
x <- rnorm(1000,mean=10)
myx <- mean(y/x)
sqrt(var(x)*myx^2/mean(x)^2 + var(y)*myx^2/mean(y)^2)  ## 0.110412
sd(y/x)  ## 0.1122373

Using your example is considerably worse because the CV of Y is close to 1 -- I initially thought it looked OK, but now I see that it's biased as well as not capturing the variability very well (I'm also plugging in the expected values of the mean and SD rather than their simulated values, but for such a large sample that should be a minor part of the error.)

me.x <- 4; sd.x <- 1
me.y <- 5; sd.y <- 4
myx <- me.y/me.x - 0/me.x^2 + sd.x*me.y/me.x^3
x <- rnorm(1e4,me.x,sd.x); y <- rnorm(1e4,me.y,sd.y)
c(myx,mean(y/x))
sdyx <- sqrt(sd.x^2*myx^2/me.x^2 + sd.y^2*myx^2/me.y^2)
c(sdyx,sd(y/x))    
## 1.113172 1.197855

rvals <- replicate(1000,
    sd(rnorm(1e4,me.y,sd.y)/rnorm(1e4,me.x,sd.x)))
hist(log(rvals),col="gray",breaks=100)
abline(v=log(sdyx),col="red",lwd=2)
min(rvals)  ## 1.182698

enter image description here

All the canned delta-method approaches to computing the variance of Y/X use the point estimate for Y/X (i.e. m(Y/X) = mY/mX), rather than the second-order approximation you used above. Constructing higher-order forms for both the mean and the variance should be straightforward if possibly tedious (a computer algebra system might help ...)

mvec <- c(x = me.x, y = me.y)
V <- diag(c(sd.x, sd.y)^2)
car::deltaMethod(mvec, "y/x", V)
##     Estimate       SE
## y/x     1.25 1.047691

library(emdbook)
sqrt(deltavar(y/x,meanval=mvec,Sigma=V)) ## 1.047691

sqrt(sd.x^2*(me.y/me.x)^2/me.x^2 + sd.y^2*(me.y/me.x)^2/me.y^2)  ## 1.047691

For what it's worth, I took the code in @SeverinPappadeux's answer and made it into a function gratio(mx,my,sx,sy). For the Cauchy case (gratio(0,0,1,1)) it gets confused and reports a mean of 0 (which should be NA/divergent) but correctly reports the variance/std dev as divergent. For the parameters specified by the OP (gratio(5,4,4,1)) it gives mean=1.352176, sd=NA as above. For the first parameters I tried above (gratio(10,5,1,1)) it gives mean=0.5051581, sd=0.1141726.

These numerical experiments strongly suggest to me that the ratio of Gaussians sometimes has a well-defined variance, but I don't know when (time for another question on Math StackOverflow or CrossValidated?)

like image 41
Ben Bolker Avatar answered Jan 30 '23 22:01

Ben Bolker