Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Generating random numbers from arbitrary probability density function

I would like to be able to generate random numbers with a probability density function that comes from a drawn curve. These two below have the same area under the curve but should produce lists of random numbers with different characteristics.

enter image description here

My intuition is that one way would be to do it is to sample the curve, and then use the areas of those rectangles to feed an np.random.choice to pick a range to do an ordinary random in the range of that rectangle's range.

enter image description here

This doesn't feel like a very efficient way to do it. Is there a more 'correct' way to do it?

I had a crack at actually doing it:

import matplotlib.pyplot as plt
import numpy as np

areas = [4.397498, 4.417111, 4.538467, 4.735034, 4.990129, 5.292455, 5.633938,
         6.008574, 6.41175, 5.888393, 2.861898, 2.347887, 2.459234, 2.494357,
         2.502986, 2.511614, 2.520243, 2.528872, 2.537501, 2.546129, 7.223747,
         7.223747, 2.448148, 1.978746, 1.750221, 1.659351, 1.669999]
divisons = [0.0, 0.037037, 0.074074, 0.111111, 0.148148, 0.185185, 0.222222,
            0.259259, 0.296296, 0.333333, 0.37037, 0.407407, 0.444444, 0.481481,
            0.518519, 0.555556, 0.592593, 0.62963, 0.666667, 0.703704, 0.740741,
            0.777778, 0.814815, 0.851852, 0.888889, 0.925926, 0.962963, 1.0]
weights = [a/sum(areas) for a in areas]
indexes = np.random.choice(range(len(areas)), 50000, p=weights)
samples = []
for i in indexes:
    samples.append(np.random.uniform(divisons[i], divisons[i+1]))

binwidth = 0.02
binSize = np.arange(min(samples), max(samples) + binwidth, binwidth)
plt.hist(samples, bins=binSize)
plt.xlim(xmax=1)
plt.show()

enter image description here

The method seems to work, but is a bit heavy!

like image 942
Ben Avatar asked Jan 15 '17 04:01

Ben


People also ask

How do you generate random numbers with probability?

Generate random value with probability Select a blank cell which you will place the random value at, type this formula =INDEX(A$2:A$8,COUNTIF(C$2:C$8,"<="&RAND())+1), press Enter key. And press F9 key to refresh the value as you need.

How do you generate a random number from a uniform distribution?

Use rand to generate 1000 random numbers from the uniform distribution on the interval (0,1). rng('default') % For reproducibility u = rand(1000,1); The inversion method relies on the principle that continuous cumulative distribution functions (cdfs) range uniformly over the open interval (0,1).

How do you find the random variable from the CDF?

Note that the CDF gives us P(X≤x). To find P(X<x), for a discrete random variable, we can simply write P(X<x)=P(X≤x)−P(X=x)=FX(x)−PX(x). Let X be a discrete random variable with range RX={1,2,3,...}. Suppose the PMF of X is given by PX(k)=12k for k=1,2,3,...


2 Answers

For your case, it seems like histogram-based approach would definitely be easiest since you have a line that the user has drawn.

But since you're just trying to generate random numbers from that distribution, you can use the normalized y-values (sum the y-position of all pixels and divide by the total) as the probability_distribution directly in the function below and just take arrays the size of the number of pixels the user has drawn.

from numpy.random import choice
pde = choice(list_of_candidates, number_of_items_to_pick, p=probability_distribution)

probability_distribution (the normalized pixel y-values) is a sequence in the same order of list_of_candidates (the associated x-values). You can also use the keyword replace=False to change the behavior so that drawn items are not replaced.

see numpy docs here

This should be much faster since you're not actually generating an entire pde, just drawing random numbers that match the pde.

EDIT: your update looks like a solid approach. If you do want to generate the pde, you might consider investigating numba (http://numba.pydata.org) to vectorize your for loop.

like image 152
Greg Jennings Avatar answered Sep 18 '22 01:09

Greg Jennings


One way to do it is to use rv_continuous from scipy.stats. The straightforward way to begin would be to approximate one of those pdf's with a a collection of splines with rv_continuous. In fact, you can generate pseudorandom deviates by defining either a pdf or a cdf with this thing.

like image 33
Bill Bell Avatar answered Sep 18 '22 01:09

Bill Bell