Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Generating lognormally distributed random number from mean, coeff of variation

Most functions for generating lognormally distributed random numbers take the mean and standard deviation of the associated normal distribution as parameters.

My problem is that I only know the mean and the coefficient of variation of the lognormal distribution. It is reasonably straight forward to derive the parameters I need for the standard functions from what I have:

If mu and sigma are the mean and standard deviation of the associated normal distribution, we know that

coeffOfVar^2 = variance / mean^2
             = (exp(sigma^2) - 1) * exp(2*mu + sigma^2) / exp(mu + sigma^2/2)^2
             = exp(sigma^2) - 1

We can rearrange this to

sigma = sqrt(log(coeffOfVar^2 + 1))

We also know that

mean = exp(mu + sigma^2/2)

This rearranges to

mu = log(mean) - sigma^2/2

Here's my R implementation

rlnorm0 <- function(mean, coeffOfVar, n = 1e6)
{
   sigma <- sqrt(log(coeffOfVar^2 + 1))
   mu <- log(mean) - sigma^2 / 2
   rlnorm(n, mu, sigma)
}

It works okay for small coefficients of variation

r1 <- rlnorm0(2, 0.5)
mean(r1)                 # 2.000095
sd(r1) / mean(r1)        # 0.4998437

But not for larger values

r2 <- rlnorm0(2, 50)
mean(r2)                 # 2.048509
sd(r2) / mean(r2)        # 68.55871

To check that it wasn't an R-specific issue, I reimplemented it in MATLAB. (Uses stats toolbox.)

function y = lognrnd0(mean, coeffOfVar, sizeOut)
if nargin < 3 || isempty(sizeOut)
   sizeOut = [1e6 1];
end
sigma = sqrt(log(coeffOfVar.^2 + 1));
mu = log(mean) - sigma.^2 ./ 2;
y = lognrnd(mu, sigma, sizeOut);
end

r1 = lognrnd0(2, 0.5);
mean(r1)                  % 2.0013
std(r1) ./ mean(r1)       % 0.5008

r2 = lognrnd0(2, 50);
mean(r2)                  % 1.9611
std(r2) ./ mean(r2)       % 22.61

Same problem. The question is, why is this happening? Is it just that the standard deviation is not robust when the variation is that wide? Or have I screwed up somewhere?

like image 290
Richie Cotton Avatar asked Jan 23 '23 09:01

Richie Cotton


1 Answers

The results are not surprising. For distributions with large kurtosis, expected variance of the sample variance is roughly mu4/N, where mu4 is the 4th moment of the distribution. For a lognormal, mu4 exponentially depends on the parameter sigma^2, meaning that for large enough values of sigma, your sample variance will be all over the place relative to the true variance. This is precisely what you have observed. In your example, mu4/N ~ (coeffOfVar^8)/N ~ 50^8/1e6 ~ 4e7.

For derivation of the expected variance of sample var. see http://mathworld.wolfram.com/SampleVarianceDistribution.html. Below is some code to illustrate the ideas in a more exact manner. Note the large value of both the variance of the sample variance and its theoretical expected value, even for coeffOfVar = 5.

exp.var.of.samp.var <- function(n,mu2,mu4){
  (n-1)*((n-1)*mu4-(n-3)*mu2^2)/n^3
}
mu2.lnorm <- function(mu,sigma){
  (exp(sigma^2)-1)*exp(2*mu+sigma^2)
}
mu4.lnorm <- function(mu,sigma){
  mu2.lnorm(mu,sigma)^2*(exp(4*sigma^2)+2*exp(3*sigma^2)+3*exp(2*sigma^2)-3)
}
exp.var.lnorm.var <- function(n,mu,sigma){
  exp.var.of.samp.var(n,mu2.lnorm(mu,sigma),mu4.lnorm(mu,sigma))
}
exp.var.norm.var <- function(n,mu,sigma){
  exp.var.of.samp.var(n,sigma^2,3*sigma^4)
}    

coeffOfVar <- 5 
mean <- 2
sigma <- sqrt(log(coeffOfVar^2 + 1)) # gives sigma=1.805020
mu <- log(mean) - sigma^2 / 2 # mu=-0.935901
n <- 1e4
m <- 1e4
## Get variance of sample variance for lognormal distribution:
var.trial <- replicate(m,var(rlnorm(n, mu, sigma)))
cat("samp. variance (mean of",m,"trials):",mean(var.trial),"\n")
cat("theor. variance:",mu2.lnorm(mu,sigma),"\n")
cat("variance of the sample var:",var(var.trial),"\n")
cat("expected variance of the sample var:",exp.var.lnorm.var(n,mu,sigma),"\n")
> samp. variance (mean of 10000 trials): 105.7192 
> theor. variance: 100 
> variance of the sample var: 350997.7 
> expected variance of the sample var: 494053.2 
## Do this with normal distribution:
var.trial <- replicate(m,var(rnorm(n, mu, sigma)))
cat("samp. variance (mean of",m,"trials):",mean(var.trial),"\n")
cat("theor. variance:",sigma^2,"\n")
cat("variance of the sample var:",var(var.trial),"\n")
cat("expected variance of the sample var:",exp.var.norm.var(n,mu,sigma),"\n")
> samp. variance (mean of 10000 trials): 3.257944 
> theor. variance: 3.258097 
> variance of the sample var: 0.002166131 
> expected variance of the sample var: 0.002122826 
like image 127
Leo Alekseyev Avatar answered Jan 24 '23 22:01

Leo Alekseyev