Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Given a 2D Numpy array representing a 2D distribution, how to sample data from this distribution with the aid of Numpy or Scipy functions?

Given a 2D numpy array dist with shape (200,200), where each entry of the array represents the joint probability of (x1, x2) for all x1 , x2 ∈ {0, 1, . . . , 199}. How do I sample bivariate data x = (x1, x2) from this probability distribution with the aid of Numpy or Scipy API?

like image 373
jabberwoo Avatar asked May 07 '19 06:05

jabberwoo


People also ask

How do you access a 2D NumPy array?

Indexing a Two-dimensional Array To access elements in this array, use two indices. One for the row and the other for the column. Note that both the column and the row indices start with 0. So if I need to access the value '10,' use the index '3' for the row and index '1' for the column.

How do you make a 2D NumPy array?

To create a NumPy array, you can use the function np. array() . All you need to do to create a simple array is pass a list to it. If you choose to, you can also specify the type of data in your list.

What is NumPy and how 2D array works?

2D array are also called as Matrices which can be represented as collection of rows and columns. In this article, we have explored 2D array in Numpy in Python. NumPy is a library in python adding support for large multidimensional arrays and matrices along with high level mathematical functions to operate these arrays.


2 Answers

This solution works with probability distributions of any number of dimensions, assuming they are a valid probability distribution (its contents must sum to 1, etc.). It flattens the distribution, samples from that, and adjusts the random index to match the original array shape.

# Create a flat copy of the array
flat = array.flatten()

# Then, sample an index from the 1D array with the
# probability distribution from the original array
sample_index = np.random.choice(a=flat.size, p=flat)

# Take this index and adjust it so it matches the original array
adjusted_index = np.unravel_index(sample_index, array.shape)
print(adjusted_index)

Also, to get multiple samples, add a size keyword argument to the np.random.choice call, and modify adjusted_index before printing it:

adjusted_index = np.array(zip(*adjusted_index))

This is necessary because np.random.choice with a size argument outputs a list of indices for each coordinate dimension, so this zips them into a list of coordinate tuples. This is also much more efficient than simply repeating the first code.


Relevant documentation:

  • np.random.choice
  • np.unravel_index
like image 103
applemonkey496 Avatar answered Oct 29 '22 21:10

applemonkey496


Here's a way, but I'm sure there's a much more elegant solution using scipy. numpy.random doesn't deal with 2d pmfs, so you have to do some reshaping gymnastics to go this way.

import numpy as np

# construct a toy joint pmf
dist=np.random.random(size=(200,200)) # here's your joint pmf 
dist/=dist.sum() # it has to be normalized 

# generate the set of all x,y pairs represented by the pmf
pairs=np.indices(dimensions=(200,200)).T # here are all of the x,y pairs 

# make n random selections from the flattened pmf without replacement
# whether you want replacement depends on your application
n=50 
inds=np.random.choice(np.arange(200**2),p=dist.reshape(-1),size=n,replace=False)

# inds is the set of n randomly chosen indicies into the flattened dist array...
# therefore the random x,y selections
# come from selecting the associated elements
# from the flattened pairs array
selections = pairs.reshape(-1,2)[inds]
like image 27
kevinkayaks Avatar answered Oct 29 '22 22:10

kevinkayaks