Does anyone have suggestions for efficiently truncating the SciPy random distributions. For example, if I generate random values like so:
import scipy.stats as stats
print stats.logistic.rvs(loc=0, scale=1, size=1000)
How would I go about constraining the output values between 0 and 1 without changing the original parameters of the distribution and without changing the sample size, all while minimizing the amount of work the machine has to do?
Your question is more of a statistics question than a scipy question. In general, you would need to be able to normalize over the interval you are interested in and compute the CDF for this interval analytically to create an efficient sampling method. Edit: And it turns out that this is possible (rejection sampling is not needed):
import scipy.stats as stats
import matplotlib.pyplot as plt
import numpy as np
import numpy.random as rnd
#plot the original distribution
xrng=np.arange(-10,10,.1)
yrng=stats.logistic.pdf(xrng)
plt.plot(xrng,yrng)
#plot the truncated distribution
nrm=stats.logistic.cdf(1)-stats.logistic.cdf(0)
xrng=np.arange(0,1,.01)
yrng=stats.logistic.pdf(xrng)/nrm
plt.plot(xrng,yrng)
#sample using the inverse cdf
yr=rnd.rand(100000)*(nrm)+stats.logistic.cdf(0)
xr=stats.logistic.ppf(yr)
plt.hist(xr,density=True)
plt.show()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With