I would like to be able to generate random numbers with a probability density function that comes from a drawn curve. These two below have the same area under the curve but should produce lists of random numbers with different characteristics.
My intuition is that one way would be to do it is to sample the curve, and then use the areas of those rectangles to feed an np.random.choice
to pick a range to do an ordinary random in the range of that rectangle's range.
This doesn't feel like a very efficient way to do it. Is there a more 'correct' way to do it?
I had a crack at actually doing it:
import matplotlib.pyplot as plt
import numpy as np
areas = [4.397498, 4.417111, 4.538467, 4.735034, 4.990129, 5.292455, 5.633938,
6.008574, 6.41175, 5.888393, 2.861898, 2.347887, 2.459234, 2.494357,
2.502986, 2.511614, 2.520243, 2.528872, 2.537501, 2.546129, 7.223747,
7.223747, 2.448148, 1.978746, 1.750221, 1.659351, 1.669999]
divisons = [0.0, 0.037037, 0.074074, 0.111111, 0.148148, 0.185185, 0.222222,
0.259259, 0.296296, 0.333333, 0.37037, 0.407407, 0.444444, 0.481481,
0.518519, 0.555556, 0.592593, 0.62963, 0.666667, 0.703704, 0.740741,
0.777778, 0.814815, 0.851852, 0.888889, 0.925926, 0.962963, 1.0]
weights = [a/sum(areas) for a in areas]
indexes = np.random.choice(range(len(areas)), 50000, p=weights)
samples = []
for i in indexes:
samples.append(np.random.uniform(divisons[i], divisons[i+1]))
binwidth = 0.02
binSize = np.arange(min(samples), max(samples) + binwidth, binwidth)
plt.hist(samples, bins=binSize)
plt.xlim(xmax=1)
plt.show()
The method seems to work, but is a bit heavy!
Generate random value with probability Select a blank cell which you will place the random value at, type this formula =INDEX(A$2:A$8,COUNTIF(C$2:C$8,"<="&RAND())+1), press Enter key. And press F9 key to refresh the value as you need.
Use rand to generate 1000 random numbers from the uniform distribution on the interval (0,1). rng('default') % For reproducibility u = rand(1000,1); The inversion method relies on the principle that continuous cumulative distribution functions (cdfs) range uniformly over the open interval (0,1).
Note that the CDF gives us P(X≤x). To find P(X<x), for a discrete random variable, we can simply write P(X<x)=P(X≤x)−P(X=x)=FX(x)−PX(x). Let X be a discrete random variable with range RX={1,2,3,...}. Suppose the PMF of X is given by PX(k)=12k for k=1,2,3,...
For your case, it seems like histogram-based approach would definitely be easiest since you have a line that the user has drawn.
But since you're just trying to generate random numbers from that distribution, you can use the normalized y-values (sum the y-position of all pixels and divide by the total) as the probability_distribution directly in the function below and just take arrays the size of the number of pixels the user has drawn.
from numpy.random import choice
pde = choice(list_of_candidates, number_of_items_to_pick, p=probability_distribution)
probability_distribution (the normalized pixel y-values) is a sequence in the same order of list_of_candidates (the associated x-values). You can also use the keyword replace=False to change the behavior so that drawn items are not replaced.
see numpy docs here
This should be much faster since you're not actually generating an entire pde, just drawing random numbers that match the pde.
EDIT: your update looks like a solid approach. If you do want to generate the pde, you might consider investigating numba (http://numba.pydata.org) to vectorize your for loop.
One way to do it is to use rv_continuous from scipy.stats. The straightforward way to begin would be to approximate one of those pdf's with a a collection of splines with rv_continuous. In fact, you can generate pseudorandom deviates by defining either a pdf or a cdf with this thing.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With