Log Normal Random Variables with Scipy

Tags:

scipy

I fail to understand the very basics of creating lognormal variables as documented here.

The log normal distribution takes on mean and variance as parameters. I would like to create a frozen distribution using these parameters and then get cdf, pdf etc.

However, in the documentation, they get the frozen distribution using

from scipy.stats import lognorm
s = 0.953682269606
rv = lognorm(s)

's' seems to be the standard deviation. I tried to use the 'loc' and 'scale' parameters instead of 's', but that generated an error (s is a required parameter). How can I generate a frozen distribution with parameter values 'm', 's' for location and scale?

805

asked Feb 24 '15 16:02

FooBar

1 Answers

The mystery is solved (edit 3)

μ corresponds to ln(scale) (!)
σ corresponds to shape (s)
loc is not needed for setting any of σ and μ

I think it is a severe problem that this is not clearly documented. I guess many have fallen for this when doing simple tests with the lognormal distribution in SciPy.

Why is that?

The stats module treats loc and scale the same for all distributions (this is not explicitly written down, but can be inferred when reading between the lines). My suspicion was that loc is substracted from x, and the result is divided by scale (and the result is treated as the new x). I tested for that, and this turned out to be the case.

What does it mean for the lognormal distribution? In the canonical definition of the lognormal distribution the term ln(x) appears. Obviously, the same term appears in SciPy's implementation. With above's considerations, this is how loc and scale end up in the logarithm:

ln((x-loc)/scale)

By common logarithm calculus, this is the same as

ln(x-loc) - ln(scale)

In the canonical definition of the lognormal distribution the term simply is ln(x) - μ. Comparing SciPy's approach and the canonical approach then provides the crucial insight: ln(scale) represents μ. loc, however, has no correspondence in the canonical definition and is better left at 0. Further below, I have argued for the fact that shape (s) is σ.

Proof

>>> import math
>>> from scipy.stats import lognorm
>>> mu = 2
>>> sigma = 2
>>> l = lognorm(s=sigma, loc=0, scale=math.exp(mu))
>>> print("mean: %.5f  stddev: %.5f" % (l.mean(), l.std()))
mean: 54.59815  stddev: 399.71719

I use WolframAlpha as a reference. It provides analytically determined values for the mean and standard deviation of the lognormal distribution.

http://www.wolframalpha.com/input/?i=log-normal+distribution%2C+mean%3D2%2C+sd%3D2

enter image description here

The values match.

WolframAlpha as well as SciPy come up with the mean and standard deviation by evaluating analytical terms. Let's perform an empirical test, by taking many samples from the SciPy distribution, and calculate their mean and standard deviation "manually" (from the whole set of samples):

>>> import numpy as np
>>> samples = l.rvs(size=2*10**7)
>>> print("mean: %.5f  stddev: %.5f" % (np.mean(samples), np.std(samples)))
mean: 54.52148  stddev: 380.14457

This is still not perfectly converged, but I think it is proof enough that the samples correspond to the same distribution that WolframAlpha assumed, given μ=2 and σ=2.

And another small edit: it looks like proper usage of a search engine would have helped, we were not the first to be trapped by this:

https://stats.stackexchange.com/questions/33036/fitting-log-normal-distribution-in-r-vs-scipy http://nbviewer.ipython.org/url/xweb.geos.ed.ac.uk/~jsteven5/blog/lognormal_distributions.ipynb scipy, lognormal distribution - parameters

Another edit: now that I know how it behaves, I realize that be behavior in principle is documented. In the "notes" section we can read:

with shape parameter sigma and scale parameter exp(mu)

It is just really not obvious (we both were not able to appreciate the importance of this small sentence). I guess the reason that we could not understand what this sentence means is that the analytical expression shown in the notes section does not include loc and scale. I guess this is worth a bug report / documentation improvement.

Original answer:

Indeed, the shape parameter topic is not well-documented when looking into the docs page for a particular distribution. I recommend having a look at the main stats documentation -- there is a section on shape parameters:

http://docs.scipy.org/doc/scipy/reference/tutorial/stats.html#shape-parameters

It looks like there should be a lognorm.shapes property, telling you about what the s parameter means, specifically.

Edit: There is only one parameter, indeed:

>>> lognorm.shapes
's'

When comparing the general definition of the lognormal distribution (from Wikipedia): enter image description here

and the formula given by the scipy docs:

lognorm.pdf(x, s) = 1 / (s*x*sqrt(2*pi)) * exp(-1/2*(log(x)/s)**2)

it becomes obvious that s is the true σ (sigma).

However, from the docs it is not obvious how the loc parameter is related to μ (mu).

It could be as in ln(x-loc), which would not correspond to μ in the general formula, or it could be ln(x)-loc, which would ensure correspondence between loc and μ. Try it out! :)

Edit 2

I have made comparisons between what WolframAlpha (WA) and SciPy say. WA is pretty clear about that it uses μ and σ as generally understood (as defined in linked Wikipedia article).

>>> l = lognorm(s=2, loc=0)
>>> print("mean: %.5f  stddev: %.5f" % (l.mean(), l.std()))
mean: 7.38906  stddev: 54.09584

This matches WA's output.

Now, for loc not being zero, there is a mismatch. Example:

>>> l = lognorm(s=2, loc=1)
>>> print("mean: %.5f  stddev: %.5f" % (l.mean(), l.std()))
mean: 8.38906  stddev: 54.09584

WA gives a mean of 20.08 and a standard deviation of 147. There you have it, loc does not correspond to μ in the classical definition of the lognormal distribution.

111

answered Oct 20 '22 11:10

Dr. Jan-Philip Gehrcke

Related questions
                            
                                ipython notebook is NOT printing until the whole program is finished
                            
                                Linux : python : clear input buffer before raw_input()
                            
                                Python handling multiple exceptions
                            
                                Authentication with Azure Active Directory - how to accept user credentials programmatically
                            
                                Why does Python provide locking mechanisms if it's subject to a GIL?
                            
                                Join/Merge two pandas dataframes and filling
                            
                                Bitwise Rotate Right
                            
                                Including a compiled module in module that is wrapped with f2py (Minimum working example)?
                            
                                Remove matplotlib text plot border
                            
                                Open a text file with accents in python
                            
                                ValueError: dictionary update sequence element #0 has length 1; 2 is required
                            
                                Algorithm to solve for water accumulation given building heights
                            
                                Do scrapers need to be written for every site they target?
                            
                                Executing shell mail command using python
                            
                                How to iterate over a dictionary - n key-value pairs at a time
                            
                                How can I integrate Tkinter with Python log in screen?
                            
                                Python Format Best Practices
                            
                                How to Multiply Decimals in Python
                            
                                Invalid block tag: 'bootstrap_icon', expected 'endblock'
                            
                                How to turn a list/tuple into a space separated string in python using a single line?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With