In scipy, is there any way to sample from a normal distribution that has only been truncated on one side?
Say I have a standard normal distribution, with the domain (-inf, 0]
.
The scipy.stats.truncnorm
class provides utilities for distributions with a specific lower and upper bound, but is there a good way to do this if you only have one or the other, short of scipy.stats.truncnorm(a=-9999999, b=0, loc=0, scale=1)
?
truncnorm() is a Truncated Normal continuous random variable. It is inherited from the of generic methods as an instance of the rv_continuous class. It completes the methods with details specific for this particular distribution. size : [tuple of ints, optional] shape or random variates.
In probability and statistics, the truncated normal distribution is the probability distribution derived from that of a normally distributed random variable by bounding the random variable from either below or above (or both). The truncated normal distribution has wide applications in statistics and econometrics.
rvs(*args, **kwds)[source] Random variates of given type. Parameters arg1, arg2, arg3,… The shape parameter(s) for the distribution (see docstring of the instance object for more information).
The location ( loc ) keyword specifies the mean. The scale ( scale ) keyword specifies the standard deviation. As an instance of the rv_continuous class, norm object inherits from it a collection of generic methods (see below for the full list), and completes them with details specific for this particular distribution.
Using np.inf
(or -np.inf
) to set the bounds results in the distribution being unbounded on that side:
scipy.stats.truncnorm(a=-np.inf, b=0, loc=0, scale=1)
H/t to @warren-weckesser for the answer in comments.
What you are describing is the Half-normal distribution (its mirror to the y-axis to be precise), a special case where the Folded Normal Distribution and the Truncated Normal Distribution are equivalent. scipy also provides that distribution: scipy.stats.foldnorm. The only reason I'm mentioning this is (not to be pedantic) because it is quite convoluted how scipy and numpy define these distributions (see below: mu
& sigma
vs. beta
, scale
, loc
, a
& b
)
Run this in a notebook and the differences become apparent:
import numpy as np
import scipy.stats as st
import matplotlib.pyplot as plt
resolution=10000
mu, sigma = 0.0, 0.2
normdist = lambda x: 1/(sigma * np.sqrt(2 * np.pi)) * np.exp( - (x - mu)**2 / (2 * sigma**2) )
f = np.random.normal(mu, sigma, resolution)
count, bins, ignored = plt.hist(f, 'auto', density=True)
plt.plot(bins, [normdist(x) for x in bins], 'r--', linewidth=2, label='(np)norm.dist')
beta=2.0
scale=1/np.sqrt(2)*(2*sigma)
nd = st.gennorm(beta, loc=mu, scale=scale)
x = np.linspace(nd.ppf(0.01), nd.ppf(0.99), resolution)
plt.plot(x, nd.pdf(x),'g-', lw=2, alpha=0.6, label='(sc)gennorm pdf')
print(f'mean: {nd.mean()}, std: {nd.std()}')
c=0.0
fnd = st.foldnorm(c, loc=0.0, scale=scale)
x = np.arange(-1.0, 5.0, 1/resolution)
plt.plot(x, fnd.pdf(x),'k-', lw=2, alpha=0.6, label='(sc)foldnorm pdf')
# count, bins, ignored = plt.hist(fnd.rvs(size=resolution), 'auto', density=True)
print(f'mean: {fnd.mean()}, std: {fnd.std()}')
a=0
b=np.inf
tnd = st.truncnorm(a=a, b=b, loc=0, scale=scale)
plt.plot(x, tnd.pdf(x),'b-', lw=1, alpha=0.6, label='(sc)truncnorm pdf')
# count, bins, ignored = plt.hist(tnd.rvs(size=resolution), 'auto', density=True)
plt.plot(x, [normdist(x) for x in x], 'r--', lw=1, label='(np)norm.dist')
print(f'mean: {fnd.mean()}, std: {fnd.std()}')
plt.legend()
plt.show()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With