I fail to understand the very basics of creating lognormal variables as documented here.
The log normal distribution takes on mean and variance as parameters. I would like to create a frozen distribution using these parameters and then get cdf, pdf etc.
However, in the documentation, they get the frozen distribution using
from scipy.stats import lognorm
s = 0.953682269606
rv = lognorm(s)
's' seems to be the standard deviation. I tried to use the 'loc' and 'scale' parameters instead of 's', but that generated an error (s is a required parameter). How can I generate a frozen distribution with parameter values 'm', 's' for location and scale?
You can use the lognorm() function from the SciPy library in Python to generate a random variable that follows a log-normal distribution.
A log-normal distribution is a continuous distribution of random variable whose natural logarithm is normally distributed. For example, if random variable y = exp { y } has log-normal distribution then x = log ( y ) has normal distribution.
The method is simple: you use the RAND function to generate X ~ N(μ, σ), then compute Y = exp(X). The random variable Y is lognormally distributed with parameters μ and σ. This is the standard definition, but notice that the parameters are specified as the mean and standard deviation of X = log(Y).
The lognormal distribution differs from the normal distribution in several ways. A major difference is in its shape: the normal distribution is symmetrical, whereas the lognormal distribution is not. Because the values in a lognormal distribution are positive, they create a right-skewed curve.
ln(scale)
(!)s
)loc
is not needed for setting any of σ and μI think it is a severe problem that this is not clearly documented. I guess many have fallen for this when doing simple tests with the lognormal distribution in SciPy.
The stats module treats loc
and scale
the same for all distributions (this is not explicitly written down, but can be inferred when reading between the lines). My suspicion was that loc
is substracted from x
, and the result is divided by scale
(and the result is treated as the new x
). I tested for that, and this turned out to be the case.
What does it mean for the lognormal distribution? In the canonical definition of the lognormal distribution the term ln(x)
appears. Obviously, the same term appears in SciPy's implementation. With above's considerations, this is how loc
and scale
end up in the logarithm:
ln((x-loc)/scale)
By common logarithm calculus, this is the same as
ln(x-loc) - ln(scale)
In the canonical definition of the lognormal distribution the term simply is ln(x) - μ
. Comparing SciPy's approach and the canonical approach then provides the crucial insight: ln(scale)
represents μ. loc
, however, has no correspondence in the canonical definition and is better left at 0. Further below, I have argued for the fact that shape (s
) is σ.
>>> import math
>>> from scipy.stats import lognorm
>>> mu = 2
>>> sigma = 2
>>> l = lognorm(s=sigma, loc=0, scale=math.exp(mu))
>>> print("mean: %.5f stddev: %.5f" % (l.mean(), l.std()))
mean: 54.59815 stddev: 399.71719
I use WolframAlpha as a reference. It provides analytically determined values for the mean and standard deviation of the lognormal distribution.
http://www.wolframalpha.com/input/?i=log-normal+distribution%2C+mean%3D2%2C+sd%3D2
The values match.
WolframAlpha as well as SciPy come up with the mean and standard deviation by evaluating analytical terms. Let's perform an empirical test, by taking many samples from the SciPy distribution, and calculate their mean and standard deviation "manually" (from the whole set of samples):
>>> import numpy as np
>>> samples = l.rvs(size=2*10**7)
>>> print("mean: %.5f stddev: %.5f" % (np.mean(samples), np.std(samples)))
mean: 54.52148 stddev: 380.14457
This is still not perfectly converged, but I think it is proof enough that the samples correspond to the same distribution that WolframAlpha assumed, given μ=2 and σ=2.
And another small edit: it looks like proper usage of a search engine would have helped, we were not the first to be trapped by this:
https://stats.stackexchange.com/questions/33036/fitting-log-normal-distribution-in-r-vs-scipy http://nbviewer.ipython.org/url/xweb.geos.ed.ac.uk/~jsteven5/blog/lognormal_distributions.ipynb scipy, lognormal distribution - parameters
Another edit: now that I know how it behaves, I realize that be behavior in principle is documented. In the "notes" section we can read:
with shape parameter sigma and scale parameter exp(mu)
It is just really not obvious (we both were not able to appreciate the importance of this small sentence). I guess the reason that we could not understand what this sentence means is that the analytical expression shown in the notes section does not include loc
and scale
. I guess this is worth a bug report / documentation improvement.
Original answer:
Indeed, the shape parameter topic is not well-documented when looking into the docs page for a particular distribution. I recommend having a look at the main stats documentation -- there is a section on shape parameters:
http://docs.scipy.org/doc/scipy/reference/tutorial/stats.html#shape-parameters
It looks like there should be a lognorm.shapes
property, telling you about what the s
parameter means, specifically.
Edit: There is only one parameter, indeed:
>>> lognorm.shapes
's'
When comparing the general definition of the lognormal distribution (from Wikipedia):
and the formula given by the scipy docs:
lognorm.pdf(x, s) = 1 / (s*x*sqrt(2*pi)) * exp(-1/2*(log(x)/s)**2)
it becomes obvious that s
is the true σ (sigma
).
However, from the docs it is not obvious how the loc
parameter is related to μ (mu
).
It could be as in ln(x-loc)
, which would not correspond to μ in the general formula, or it could be ln(x)-loc
, which would ensure correspondence between loc
and μ. Try it out! :)
Edit 2
I have made comparisons between what WolframAlpha (WA) and SciPy say. WA is pretty clear about that it uses μ and σ as generally understood (as defined in linked Wikipedia article).
>>> l = lognorm(s=2, loc=0)
>>> print("mean: %.5f stddev: %.5f" % (l.mean(), l.std()))
mean: 7.38906 stddev: 54.09584
This matches WA's output.
Now, for loc
not being zero, there is a mismatch. Example:
>>> l = lognorm(s=2, loc=1)
>>> print("mean: %.5f stddev: %.5f" % (l.mean(), l.std()))
mean: 8.38906 stddev: 54.09584
WA gives a mean of 20.08 and a standard deviation of 147. There you have it, loc
does not correspond to μ in the classical definition of the lognormal distribution.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With