Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Random Number from Histogram

Suppose I create a histogram using scipy/numpy, so I have two arrays: one for the bin counts, and one for the bin edges. If I use the histogram to represent a probability distribution function, how can I efficiently generate random numbers from that distribution?

like image 857
xvtk Avatar asked Jul 23 '13 21:07

xvtk


People also ask

How do you generate a random number from probability?

In fact, to generate random values with the probability, you only need two formulas. 2. Select a blank cell which you will place the random value at, type this formula =INDEX(A$2:A$8,COUNTIF(C$2:C$8,"<="&RAND())+1), press Enter key. And press F9 key to refresh the value as you need.


2 Answers

It's probably what np.random.choice does in @Ophion's answer, but you can construct a normalized cumulative density function, then choose based on a uniform random number:

from __future__ import division import numpy as np import matplotlib.pyplot as plt  data = np.random.normal(size=1000) hist, bins = np.histogram(data, bins=50)  bin_midpoints = bins[:-1] + np.diff(bins)/2 cdf = np.cumsum(hist) cdf = cdf / cdf[-1] values = np.random.rand(10000) value_bins = np.searchsorted(cdf, values) random_from_cdf = bin_midpoints[value_bins]  plt.subplot(121) plt.hist(data, 50) plt.subplot(122) plt.hist(random_from_cdf, 50) plt.show() 

enter image description here


A 2D case can be done as follows:

data = np.column_stack((np.random.normal(scale=10, size=1000),                         np.random.normal(scale=20, size=1000))) x, y = data.T                         hist, x_bins, y_bins = np.histogram2d(x, y, bins=(50, 50)) x_bin_midpoints = x_bins[:-1] + np.diff(x_bins)/2 y_bin_midpoints = y_bins[:-1] + np.diff(y_bins)/2 cdf = np.cumsum(hist.ravel()) cdf = cdf / cdf[-1]  values = np.random.rand(10000) value_bins = np.searchsorted(cdf, values) x_idx, y_idx = np.unravel_index(value_bins,                                 (len(x_bin_midpoints),                                  len(y_bin_midpoints))) random_from_cdf = np.column_stack((x_bin_midpoints[x_idx],                                    y_bin_midpoints[y_idx])) new_x, new_y = random_from_cdf.T  plt.subplot(121, aspect='equal') plt.hist2d(x, y, bins=(50, 50)) plt.subplot(122, aspect='equal') plt.hist2d(new_x, new_y, bins=(50, 50)) plt.show() 

enter image description here

like image 55
Jaime Avatar answered Sep 24 '22 10:09

Jaime


@Jaime solution is great, but you should consider using the kde (kernel density estimation) of the histogram. A great explanation why it's problematic to do statistics over histogram, and why you should use kde instead can be found here

I edited @Jaime's code to show how to use kde from scipy. It looks almost the same, but captures better the histogram generator.

from __future__ import division import numpy as np import matplotlib.pyplot as plt from scipy.stats import gaussian_kde  def run():     data = np.random.normal(size=1000)     hist, bins = np.histogram(data, bins=50)      x_grid = np.linspace(min(data), max(data), 1000)     kdepdf = kde(data, x_grid, bandwidth=0.1)     random_from_kde = generate_rand_from_pdf(kdepdf, x_grid)      bin_midpoints = bins[:-1] + np.diff(bins) / 2     random_from_cdf = generate_rand_from_pdf(hist, bin_midpoints)      plt.subplot(121)     plt.hist(data, 50, normed=True, alpha=0.5, label='hist')     plt.plot(x_grid, kdepdf, color='r', alpha=0.5, lw=3, label='kde')     plt.legend()     plt.subplot(122)     plt.hist(random_from_cdf, 50, alpha=0.5, label='from hist')     plt.hist(random_from_kde, 50, alpha=0.5, label='from kde')     plt.legend()     plt.show()   def kde(x, x_grid, bandwidth=0.2, **kwargs):     """Kernel Density Estimation with Scipy"""     kde = gaussian_kde(x, bw_method=bandwidth / x.std(ddof=1), **kwargs)     return kde.evaluate(x_grid)   def generate_rand_from_pdf(pdf, x_grid):     cdf = np.cumsum(pdf)     cdf = cdf / cdf[-1]     values = np.random.rand(1000)     value_bins = np.searchsorted(cdf, values)     random_from_cdf = x_grid[value_bins]     return random_from_cdf 

enter image description here

like image 40
Noam Peled Avatar answered Sep 24 '22 10:09

Noam Peled