Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Generate random numbers from lognormal distribution in python

I need to generate pseudo-random numbers from a lognormal distribution in Python. The problem is that I am starting from the mode and standard deviation of the lognormal distribution. I don't have the mean or median of the lognormal distribution, nor any of the parameters of the underlying normal distribution.

numpy.random.lognormal takes the mean and standard deviation of the underlying normal distribution. I tried to calculate these from the parameters I have, but wound up with a quartic function. It has a solution, but I hope that there is a more straightforward way to do this.

scipy.stats.lognorm takes parameters that I don't understand. I am not a native English speaker and the documentation doesn't make sense.

Can you help me, please?

like image 362
Bobesh Avatar asked Jan 04 '17 13:01

Bobesh


People also ask

How do you generate a random number from a log-normal distribution in Python?

You can use the lognorm() function from the SciPy library in Python to generate a random variable that follows a log-normal distribution.

How are lognormal random numbers generated?

r = lognrnd( mu , sigma ) generates a random number from the lognormal distribution with the distribution parameters mu (mean of logarithmic values) and sigma (standard deviation of logarithmic values).

How do you simulate lognormal data in Python?

The method is simple: you use the RAND function to generate X ~ N(μ, σ), then compute Y = exp(X). The random variable Y is lognormally distributed with parameters μ and σ. This is the standard definition, but notice that the parameters are specified as the mean and standard deviation of X = log(Y).


1 Answers

You have the mode and the standard deviation of the log-normal distribution. To use the rvs() method of scipy's lognorm, you have to parameterize the distribution in terms of the shape parameter s, which is the standard deviation sigma of the underlying normal distribution, and the scale, which is exp(mu), where mu is the mean of the underlying distribution.

You pointed out that making this reparameterization requires solving a quartic polynomial. For that, we can use the numpy.poly1d class. Instances of that class have a roots attribute.

A little algebra shows that exp(sigma**2) is the unique positive real root of the polynomial

x**4 - x**3 - (stddev/mode)**2 = 0

where stddev and mode are the given standard deviation and mode of the log-normal distribution, and for that solution, the scale (i.e. exp(mu)) is

scale = mode*x

Here's a function that converts the mode and standard deviation to the shape and scale:

def lognorm_params(mode, stddev):
    """
    Given the mode and std. dev. of the log-normal distribution, this function
    returns the shape and scale parameters for scipy's parameterization of the
    distribution.
    """
    p = np.poly1d([1, -1, 0, 0, -(stddev/mode)**2])
    r = p.roots
    sol = r[(r.imag == 0) & (r.real > 0)].real
    shape = np.sqrt(np.log(sol))
    scale = mode * sol
    return shape, scale

For example,

In [155]: mode = 123

In [156]: stddev = 99

In [157]: sigma, scale = lognorm_params(mode, stddev)

Generate a sample using the computed parameters:

In [158]: from scipy.stats import lognorm

In [159]: sample = lognorm.rvs(sigma, 0, scale, size=1000000)

Here's the standard deviation of the sample:

In [160]: np.std(sample)
Out[160]: 99.12048952171304

And here's some matplotlib code to plot a histogram of the sample, with a vertical line drawn at the mode of the distribution from which the sample was drawn:

In [176]: tmp = plt.hist(sample, normed=True, bins=1000, alpha=0.6, color='c', ec='c')

In [177]: plt.xlim(0, 600)
Out[177]: (0, 600)

In [178]: plt.axvline(mode)
Out[178]: <matplotlib.lines.Line2D at 0x12c5a12e8>

The histogram:

histogram


If you want to generate the sample using numpy.random.lognormal() instead of scipy.stats.lognorm.rvs(), you can do this:

In [200]: sigma, scale = lognorm_params(mode, stddev)

In [201]: mu = np.log(scale)

In [202]: sample = np.random.lognormal(mu, sigma, size=1000000)

In [203]: np.std(sample)
Out[203]: 99.078297384090902

I haven't looked into how robust poly1d's roots algorithm is, so be sure to test for a wide range of possible input values. Alternatively, you can use a solver from scipy to solve the above polynomial for x. You can bound the solution using:

max(sqrt(stddev/mode), 1) <= x <= sqrt(stddev/mode) + 1
like image 107
Warren Weckesser Avatar answered Sep 28 '22 10:09

Warren Weckesser