Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I generate a sample from a log-normal distribution with Pareto tail in R?

Tags:

r

power-law

I would like to generate a sample from a log-normal distribution with Pareto tail in R. Can somebody help me? Thanks.

like image 832
stochazesthai Avatar asked Dec 25 '22 10:12

stochazesthai


1 Answers

I'm not sure this is what you're looking for, but there is a fair amount of literature on the topic of Double Pareto Log-normal Distributions, or so-ca.led dPlN. See this, and this, and this. These are used to simulate everything from the distribution of earnings and income, to oil field size, to internet traffic.

If this is not what you're looking for, let me know and I'll delete the post.

You ask how to generate a random sample from dPlN. Theoretically, it is possible to generate a random sample from an arbitrary distribution by generating a random sample from the uniform distribution, U[0,1] and transforming that using the inverse CDF of the target distribution.

So first, we need the PDF of the dPlN, then we integrate that to find the CDF, then we invert that to find the inverse CDF. The PDF of dPlN is given in the first reference in Eqn 8 and 9:

where α and β are the location parameters and ν and τ2 are the mean and variance of the log-normal distribution. Φ and Φc are the CDF and complementary CDF of N[0,1]. Crudely, smaller α and β mean longer tail, ν influences the position of the peak, and τ influences the width of the peak.

So in R, we calculate the PDF, CDF, and inverse CDF of dPlN as follows:

f = function(x,alpha, beta, nu, tau) {   # probability density of dPlN
  A = function(theta, nu, tau) exp(theta*nu +(alpha*tau)^2/2)
  c = alpha*beta/(alpha+beta)
  z.alpha = (log(x) - nu - alpha*tau^2)/tau
  z.beta  = (log(x) - nu + beta*tau^2)/tau
  t.alpha = x^-(alpha+1)*A(alpha,nu,tau)*pnorm(z.alpha)
  t.beta  = x^(beta-1)*A(-beta,nu,tau)*(1-pnorm(z.beta))
  return(c*(t.alpha + t.beta))
}
F = function(x,alpha,beta,nu,tau) {      # cumulative density function of dPlN
  ifelse(x > 1e4, 1, integrate(f,0.001,x,alpha,beta,nu,tau)$value)}
F = Vectorize(F, vectorize.args="x")

F.inv = function(y, alpha,beta,nu,tau){  # inverse CDF of dPlN
  uniroot(function(x, alpha,beta,nu,tau){F(x, alpha,beta,nu,tau)-y},
          interval=c(0,1e6),alpha,beta,nu,tau)$root
}
F.inv = Vectorize(F.inv, vectorize.args="y")

x=seq(0,50,length.out=1000)
y=seq(0,.995,length.out=1000)

par(mfrow=c(1,3))
plot(x,f(x,2,2,2,1),type="l",main="f(x)")
plot(x,F(x,2,2,2,1),type="l",main="CDF of f(x)")
plot(y,F.inv(y,2,2,2,1),type="l",main="Inverse CDF of f(x)")

Finally, we use F.inv(...) to generate random variates of the dPlN, and plot the results to demonstrate that the random sample does indeed follow the expected probability distribution.

# random sample from dPlN (double Pareto Lognormal distribution)
X = runif(1000,0,1)   # random sample from U[0,1]
Z = F.inv(X,2,2,2,1)

par(mfrow=c(1,1))
hist(Z, breaks=c(seq(min(x),max(x),length=50),Inf), 
     xlim=range(x), freq=FALSE)
lines(x,f(x,2,2,2,1),main="Density function",
      xlim=range(x), col="red", lty=2)

Disclaimer This code has not been tested with all possible values of alpha, beta, nu, and tau, so there are no guarantees it will work under all circumstances.

like image 85
jlhoward Avatar answered Dec 29 '22 18:12

jlhoward