Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python: how to define customized distributions?

I want to create a customized distribution based on a Levy truncated law, which reads

p(r) = (r + r0)**(-beta)*exp(-r/k).

So I defined it in the following way:

import numpy as np
import scipy.stats as st

class LevyPDF(st.rv_continuous):
    def _pdf(self,r):
        r0 = 100
        k = 1500
        beta = 1.6
        return (r + r0)**(-beta)*np.exp(-r/k)

Suppose that I want to find the distribution of distances between r = 0 and r = 50km. Then:

nmin = 0
nmax = 50
my_cv = LevyPDF(a=nmin, b=nmax, name='LevyPDF')
x = np.linspace(nmin, nmax, (nmax-nmin)*2)

I do not understand why:

sum(my_cv.cdf(x)) = 2.22

instead of 1.

Then how can I define an histogram of N = 2000000 random distances based on the distribution that I defined?

like image 668
emax Avatar asked Dec 03 '25 07:12

emax


2 Answers

In the past few years, nice new tools have been added to SciPy to address this kind of problem in Python. You can easily generate samples from custom continuous or discrete univariate distributions by just providing some information about the distribution, such as the density / pdf.

Overview of the different methods: https://docs.scipy.org/doc/scipy/reference/stats.sampling.html

Tutorial: https://docs.scipy.org/doc/scipy/tutorial/stats/sampling.html

Here is how it works for your problem. Note that this will be much faster than relying on the default methods (integrating the pdf with quad, then inverting the cdf with a root-finding algorithm) in scipy.stats.rv_continuous which are slow. I did a quick test: generating 1000 rvs with the approach presented in jlandercy's post takes a few seconds, but less than a millisecond with the fast inversion method below on my computer.

import numpy as np
from scipy.stats.sampling import NumericalInversePolynomial
from matplotlib import pyplot as plt
from scipy.integrate import quad


class MyDist:
    def __init__(self, r0, k, beta):
        self.r0 = r0
        self.k = k
        self.beta = beta

    def support(self):
        # distribution restricted to 0, 50 as indicated in the question
        # could be changed to other intervals
        return (0, 50)

    def pdf(self, x):
        # this is not a proper pdf, the normalizing
        # constant is missing (does not integrate to one)
        # this is ok for the method presented
        return (x + self.r0)**(-self.beta)*np.exp(-x/self.k)


r0 = 100
k = 1500
beta = 1.6

dist = MyDist(r0, k, beta)
gen = NumericalInversePolynomial(dist)

# compute the missing normalizing constant to plot the pdf
const_pdf = quad(dist.pdf, *dist.support())[0]

r = gen.rvs(size=10000)
x = np.linspace(r.min(), r.max(), 500)

# show histogram together with the pdf
plt.plot(x, dist.pdf(x) / const_pdf)
plt.hist(r, density=True, bins=50)
plt.show()

Density and histogram of the generated samples

like image 188
Christoph Baumgarten Avatar answered Dec 05 '25 20:12

Christoph Baumgarten


Using your minimal example (slightly adapted):

import scipy.stats as st
import numpy as np
import matplotlib.pyplot as plt

class LevyPDF(st.rv_continuous):
    def _pdf(self,r):
        r0 = 100
        k = 1500
        beta = 1.6
        return (r + r0)**(-beta)*np.exp(-r/k)

nmin = 0
nmax = 50
my_cv = LevyPDF(a=nmin, b=nmax, name='LevyPDF')

To sample from your random variable, use rvs() method from rv_continuous class:

N = 50000
X = my_cv.rvs(size=N, random_state=1)

Will return an array of size (N,) with random variates sampled from your distribution. Use random_state option to freeze your example and make your script repeatable (it defines random seed for your sampling).

Note as N softly increases, computation time drastically increases.

To plot histogram, use matplotlib library, see hist:

fig, axe = plt.subplots()
n, bins, patches = axe.hist(X, 50, normed=1, facecolor='green', alpha=0.75)
plt.show(axe)

Bellow a example of sampling from Chi Square with 40 Degrees of Freedom:

from scipy import stats
import numpy as np
import matplotlib.pyplot as plt
rv = stats.chi2(40)
N = 200000
X = rv.rvs(size=N, random_state=1)
fig, axe = plt.subplots()
n, bins, patches = axe.hist(X, 50, normed=1, facecolor='green', alpha=0.75)
plt.show(axe)

It leads to:

Chi2Sample

like image 22
jlandercy Avatar answered Dec 05 '25 22:12

jlandercy