Given a 2D Numpy array representing a 2D distribution, how to sample data from this distribution with the aid of Numpy or Scipy functions?

Tags:

Given a 2D numpy array dist with shape (200,200), where each entry of the array represents the joint probability of (x1, x2) for all x1 , x2 ∈ {0, 1, . . . , 199}. How do I sample bivariate data x = (x1, x2) from this probability distribution with the aid of Numpy or Scipy API?

373

asked May 07 '19 06:05

jabberwoo

2 Answers

This solution works with probability distributions of any number of dimensions, assuming they are a valid probability distribution (its contents must sum to 1, etc.). It flattens the distribution, samples from that, and adjusts the random index to match the original array shape.

# Create a flat copy of the array
flat = array.flatten()

# Then, sample an index from the 1D array with the
# probability distribution from the original array
sample_index = np.random.choice(a=flat.size, p=flat)

# Take this index and adjust it so it matches the original array
adjusted_index = np.unravel_index(sample_index, array.shape)
print(adjusted_index)

Also, to get multiple samples, add a size keyword argument to the np.random.choice call, and modify adjusted_index before printing it:

adjusted_index = np.array(zip(*adjusted_index))

This is necessary because np.random.choice with a size argument outputs a list of indices for each coordinate dimension, so this zips them into a list of coordinate tuples. This is also much more efficient than simply repeating the first code.

Relevant documentation:

np.random.choice
np.unravel_index

103

answered Oct 29 '22 21:10

applemonkey496

Here's a way, but I'm sure there's a much more elegant solution using scipy. numpy.random doesn't deal with 2d pmfs, so you have to do some reshaping gymnastics to go this way.

import numpy as np

# construct a toy joint pmf
dist=np.random.random(size=(200,200)) # here's your joint pmf 
dist/=dist.sum() # it has to be normalized 

# generate the set of all x,y pairs represented by the pmf
pairs=np.indices(dimensions=(200,200)).T # here are all of the x,y pairs 

# make n random selections from the flattened pmf without replacement
# whether you want replacement depends on your application
n=50 
inds=np.random.choice(np.arange(200**2),p=dist.reshape(-1),size=n,replace=False)

# inds is the set of n randomly chosen indicies into the flattened dist array...
# therefore the random x,y selections
# come from selecting the associated elements
# from the flattened pairs array
selections = pairs.reshape(-1,2)[inds]

answered Oct 29 '22 22:10

kevinkayaks

Related questions
                            
                                With AWS SageMaker, is it possible to deploy a pre-trained model using the sagemaker SDK?
                            
                                How to plot the slope (tangent line) of parabola at any point?
                            
                                How to get datediff() in seconds in pyspark?
                            
                                how to reflect an existing table by using flask_sqlalchemy
                            
                                Why can I call Fortran subroutine through f2py without having right number of inputs?
                            
                                Plotly: How to make stacked bar chart from single trace?
                            
                                python setuptools compile fortran code and make an entry points
                            
                                Adding value of single numpy array to all columns in other numpy array [duplicate]
                            
                                How do I scrape image-src in beautifulsoup
                            
                                How to align text left on a plotly bar chart (example image contained) [Plotly-Dash]
                            
                                My RST README is not formatted correctly on PyPi
                            
                                Issue in installing pysqlcipher3
                            
                                How can I create and fit vocab.bpe file (GPT and GPT2 OpenAI models) with my own corpus text?
                            
                                How can I connect to the database with pypika?
                            
                                Pandas: How to read a DataFrame from excel-file where multiple rows are sometimes separated by line break (\n)
                            
                                Python jira 401 recoverable error using basic auth
                            
                                How to use custom folding icons in QScintilla?
                            
                                Dataframe filter rows by hour and max date
                            
                                Filter out troughs based on distance between peaks
                            
                                How predict_proba in sklearn produces two columns? what are their significance?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Given a 2D Numpy array representing a 2D distribution, how to sample data from this distribution with the aid of Numpy or Scipy functions?

Tags:

python

arrays

python-3.x

numpy

scipy

jabberwoo

People also ask

2 Answers

applemonkey496

kevinkayaks

Recent Activity

Donate For Us