Random Number from Histogram

Tags:

Suppose I create a histogram using scipy/numpy, so I have two arrays: one for the bin counts, and one for the bin edges. If I use the histogram to represent a probability distribution function, how can I efficiently generate random numbers from that distribution?

857

asked Jul 23 '13 21:07

xvtk

2 Answers

It's probably what np.random.choice does in @Ophion's answer, but you can construct a normalized cumulative density function, then choose based on a uniform random number:

from __future__ import division import numpy as np import matplotlib.pyplot as plt  data = np.random.normal(size=1000) hist, bins = np.histogram(data, bins=50)  bin_midpoints = bins[:-1] + np.diff(bins)/2 cdf = np.cumsum(hist) cdf = cdf / cdf[-1] values = np.random.rand(10000) value_bins = np.searchsorted(cdf, values) random_from_cdf = bin_midpoints[value_bins]  plt.subplot(121) plt.hist(data, 50) plt.subplot(122) plt.hist(random_from_cdf, 50) plt.show()

enter image description here

A 2D case can be done as follows:

data = np.column_stack((np.random.normal(scale=10, size=1000),                         np.random.normal(scale=20, size=1000))) x, y = data.T                         hist, x_bins, y_bins = np.histogram2d(x, y, bins=(50, 50)) x_bin_midpoints = x_bins[:-1] + np.diff(x_bins)/2 y_bin_midpoints = y_bins[:-1] + np.diff(y_bins)/2 cdf = np.cumsum(hist.ravel()) cdf = cdf / cdf[-1]  values = np.random.rand(10000) value_bins = np.searchsorted(cdf, values) x_idx, y_idx = np.unravel_index(value_bins,                                 (len(x_bin_midpoints),                                  len(y_bin_midpoints))) random_from_cdf = np.column_stack((x_bin_midpoints[x_idx],                                    y_bin_midpoints[y_idx])) new_x, new_y = random_from_cdf.T  plt.subplot(121, aspect='equal') plt.hist2d(x, y, bins=(50, 50)) plt.subplot(122, aspect='equal') plt.hist2d(new_x, new_y, bins=(50, 50)) plt.show()

enter image description here

answered Sep 24 '22 10:09

Jaime

@Jaime solution is great, but you should consider using the kde (kernel density estimation) of the histogram. A great explanation why it's problematic to do statistics over histogram, and why you should use kde instead can be found here

I edited @Jaime's code to show how to use kde from scipy. It looks almost the same, but captures better the histogram generator.

from __future__ import division import numpy as np import matplotlib.pyplot as plt from scipy.stats import gaussian_kde  def run():     data = np.random.normal(size=1000)     hist, bins = np.histogram(data, bins=50)      x_grid = np.linspace(min(data), max(data), 1000)     kdepdf = kde(data, x_grid, bandwidth=0.1)     random_from_kde = generate_rand_from_pdf(kdepdf, x_grid)      bin_midpoints = bins[:-1] + np.diff(bins) / 2     random_from_cdf = generate_rand_from_pdf(hist, bin_midpoints)      plt.subplot(121)     plt.hist(data, 50, normed=True, alpha=0.5, label='hist')     plt.plot(x_grid, kdepdf, color='r', alpha=0.5, lw=3, label='kde')     plt.legend()     plt.subplot(122)     plt.hist(random_from_cdf, 50, alpha=0.5, label='from hist')     plt.hist(random_from_kde, 50, alpha=0.5, label='from kde')     plt.legend()     plt.show()   def kde(x, x_grid, bandwidth=0.2, **kwargs):     """Kernel Density Estimation with Scipy"""     kde = gaussian_kde(x, bw_method=bandwidth / x.std(ddof=1), **kwargs)     return kde.evaluate(x_grid)   def generate_rand_from_pdf(pdf, x_grid):     cdf = np.cumsum(pdf)     cdf = cdf / cdf[-1]     values = np.random.rand(1000)     value_bins = np.searchsorted(cdf, values)     random_from_cdf = x_grid[value_bins]     return random_from_cdf

enter image description here

answered Sep 24 '22 10:09

Noam Peled

Related questions
                            
                                How to generate url from boto3 in amazon web services
                            
                                Beautiful Soup if Class "Contains" or Regex?
                            
                                Can I use index information inside the map function?
                            
                                Python atan or atan2, what should I use?
                            
                                How to graph grid scores from GridSearchCV?
                            
                                Pandas Extract Number from String
                            
                                Python natural smoothing splines
                            
                                Jupyter command `jupyter-lab` not found
                            
                                Syntax to call random function from a list [duplicate]
                            
                                How do I add a header to urllib2 opener?
                            
                                How to install virtualenv without using sudo?
                            
                                Retrieve all items from DynamoDB using query?
                            
                                Set column name for apply result over groupby
                            
                                Pass a dict to scikit learn estimator
                            
                                Reading JSON file with Python 3
                            
                                Django equivalent of COUNT with GROUP BY
                            
                                How to get plain text out of Wikipedia
                            
                                All possible variants of zip in Python
                            
                                Right way to clean up a temporary folder in Python class
                            
                                Calling C functions in Python

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Random Number from Histogram

Tags:

python

numpy

scipy

montecarlo

xvtk

People also ask

2 Answers

Jaime

Noam Peled

Recent Activity

Donate For Us