I want to create a customized distribution based on a Levy truncated law, which reads
p(r) = (r + r0)**(-beta)*exp(-r/k).
So I defined it in the following way:
import numpy as np
import scipy.stats as st
class LevyPDF(st.rv_continuous):
def _pdf(self,r):
r0 = 100
k = 1500
beta = 1.6
return (r + r0)**(-beta)*np.exp(-r/k)
Suppose that I want to find the distribution of distances between r = 0 and r = 50km. Then:
nmin = 0
nmax = 50
my_cv = LevyPDF(a=nmin, b=nmax, name='LevyPDF')
x = np.linspace(nmin, nmax, (nmax-nmin)*2)
I do not understand why:
sum(my_cv.cdf(x)) = 2.22
instead of 1.
Then how can I define an histogram of N = 2000000 random distances based on the distribution that I defined?
In the past few years, nice new tools have been added to SciPy to address this kind of problem in Python. You can easily generate samples from custom continuous or discrete univariate distributions by just providing some information about the distribution, such as the density / pdf.
Overview of the different methods: https://docs.scipy.org/doc/scipy/reference/stats.sampling.html
Tutorial: https://docs.scipy.org/doc/scipy/tutorial/stats/sampling.html
Here is how it works for your problem. Note that this will be much faster than relying on the default methods (integrating the pdf with quad, then inverting the cdf with a root-finding algorithm) in scipy.stats.rv_continuous which are slow. I did a quick test: generating 1000 rvs with the approach presented in jlandercy's post takes a few seconds, but less than a millisecond with the fast inversion method below on my computer.
import numpy as np
from scipy.stats.sampling import NumericalInversePolynomial
from matplotlib import pyplot as plt
from scipy.integrate import quad
class MyDist:
def __init__(self, r0, k, beta):
self.r0 = r0
self.k = k
self.beta = beta
def support(self):
# distribution restricted to 0, 50 as indicated in the question
# could be changed to other intervals
return (0, 50)
def pdf(self, x):
# this is not a proper pdf, the normalizing
# constant is missing (does not integrate to one)
# this is ok for the method presented
return (x + self.r0)**(-self.beta)*np.exp(-x/self.k)
r0 = 100
k = 1500
beta = 1.6
dist = MyDist(r0, k, beta)
gen = NumericalInversePolynomial(dist)
# compute the missing normalizing constant to plot the pdf
const_pdf = quad(dist.pdf, *dist.support())[0]
r = gen.rvs(size=10000)
x = np.linspace(r.min(), r.max(), 500)
# show histogram together with the pdf
plt.plot(x, dist.pdf(x) / const_pdf)
plt.hist(r, density=True, bins=50)
plt.show()

Using your minimal example (slightly adapted):
import scipy.stats as st
import numpy as np
import matplotlib.pyplot as plt
class LevyPDF(st.rv_continuous):
def _pdf(self,r):
r0 = 100
k = 1500
beta = 1.6
return (r + r0)**(-beta)*np.exp(-r/k)
nmin = 0
nmax = 50
my_cv = LevyPDF(a=nmin, b=nmax, name='LevyPDF')
To sample from your random variable, use rvs() method from rv_continuous class:
N = 50000
X = my_cv.rvs(size=N, random_state=1)
Will return an array of size (N,) with random variates sampled from your distribution. Use random_state option to freeze your example and make your script repeatable (it defines random seed for your sampling).
Note as N softly increases, computation time drastically increases.
To plot histogram, use matplotlib library, see hist:
fig, axe = plt.subplots()
n, bins, patches = axe.hist(X, 50, normed=1, facecolor='green', alpha=0.75)
plt.show(axe)
Bellow a example of sampling from Chi Square with 40 Degrees of Freedom:
from scipy import stats
import numpy as np
import matplotlib.pyplot as plt
rv = stats.chi2(40)
N = 200000
X = rv.rvs(size=N, random_state=1)
fig, axe = plt.subplots()
n, bins, patches = axe.hist(X, 50, normed=1, facecolor='green', alpha=0.75)
plt.show(axe)
It leads to:

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With