Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Generating random numbers with a given probability density function

I want to specify the probability density function of a distribution and then pick up N random numbers from that distribution in Python. How do I go about doing that?

like image 855
user2445465 Avatar asked Aug 24 '14 11:08

user2445465


People also ask

How do you generate a random number from probability?

In fact, to generate random values with the probability, you only need two formulas. 2. Select a blank cell which you will place the random value at, type this formula =INDEX(A$2:A$8,COUNTIF(C$2:C$8,"<="&RAND())+1), press Enter key. And press F9 key to refresh the value as you need.

How do you generate a random number from a CDF?

Example 0: y=1 Now, to invert the cdf, we flip x and y, and then solve for y again. It's trivially easy… , we can generate uniform random numbers, plug them into that equation as x and get y which is the actual value drawn from our PDF.

How do you find the probability density function of a random variable?

The probability density function (pdf) f(x) of a continuous random variable X is defined as the derivative of the cdf F(x): f(x)=ddxF(x).


2 Answers

In general, you want to have the inverse cumulative probability density function. Once you have that, then generating the random numbers along the distribution is simple:

import random

def sample(n):
    return [ icdf(random.random()) for _ in range(n) ]

Or, if you use NumPy:

import numpy as np

def sample(n):
    return icdf(np.random.random(n))

In both cases icdf is the inverse cumulative distribution function which accepts a value between 0 and 1 and outputs the corresponding value from the distribution.

To illustrate the nature of icdf, we'll take a simple uniform distribution between values 10 and 12 as an example:

  • probability distribution function is 0.5 between 10 and 12, zero elsewhere

  • cumulative distribution function is 0 below 10 (no samples below 10), 1 above 12 (no samples above 12) and increases linearly between the values (integral of the PDF)

  • inverse cumulative distribution function is only defined between 0 and 1. At 0 it is 10, at 12 it is 1, and changes linearly between the values

Of course, the difficult part is obtaining the inverse cumulative density function. It really depends on your distribution, sometimes you may have an analytical function, sometimes you may want to resort to interpolation. Numerical methods may be useful, as numerical integration can be used to create the CDF and interpolation can be used to invert it.

like image 55
DrV Avatar answered Sep 19 '22 08:09

DrV


This is my function to retrieve a single random number distributed according to the given probability density function. I used a Monte-Carlo like approach. Of course n random numbers can be generated by calling this function n times.

    """
    Draws a random number from given probability density function.

    Parameters
    ----------
        pdf       -- the function pointer to a probability density function of form P = pdf(x)
        interval  -- the resulting random number is restricted to this interval
        pdfmax    -- the maximum of the probability density function
        integers  -- boolean, indicating if the result is desired as integer
        max_iterations -- maximum number of 'tries' to find a combination of random numbers (rand_x, rand_y) located below the function value calc_y = pdf(rand_x).

    returns a single random number according the pdf distribution.
    """
    def draw_random_number_from_pdf(pdf, interval, pdfmax = 1, integers = False, max_iterations = 10000):
        for i in range(max_iterations):
            if integers == True:
                rand_x = np.random.randint(interval[0], interval[1])
            else:
                rand_x = (interval[1] - interval[0]) * np.random.random(1) + interval[0] #(b - a) * random_sample() + a

            rand_y = pdfmax * np.random.random(1) 
            calc_y = pdf(rand_x)

            if(rand_y <= calc_y ):
                return rand_x

        raise Exception("Could not find a matching random number within pdf in " + max_iterations + " iterations.")

In my opinion this solution is performing better than other solutions if you do not have to retrieve a very large number of random variables. Another benefit is that you only need the PDF and avoid calculating the CDF, inverse CDF or weights.

like image 28
linksfate Avatar answered Sep 20 '22 08:09

linksfate