Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pythonic way to generate random uniformly distributed points within hollow square lamina

Suppose we have a hollow square lamina of size n. That is, we have a nxn square from which a k*l rectangle has been removed (1<=k,l<=n-2). I want to calculate an average of distances between 2 random, uniformly distributed points within such hollow square lamina. For the sake of simplicity let's consider n=3, k=l=1, or a 3x3 square from whose center a unit square has been removed

I wrote this code for numpy, but it has at least 2 problems: I have to throw away approximately 1/9 of all generated points and removing the numpy.array elements requires lots of RAM:

    x,y = 3*np.random.random((2,size,2))
x = x[
    np.logical_not(np.logical_and(
        np.logical_and(x[:,0] > 1, x[:,0] < 2),
        np.logical_and(x[:,1] > 1, x[:,1] < 2)
        ))
    ]
y = y[
    np.logical_not(np.logical_and(
        np.logical_and(y[:,0] > 1, y[:,0] < 2),
        np.logical_and(y[:,1] > 1, y[:,1] < 2)
        ))
    ]
n = min(x.shape[0], y.shape[0])

UPD:Here size is the sample size of which I'm going to calculate average. Is there an elegant way to generate those points right away, without removing the unfit ones?

UPD: Here is the full code just for the reference:

def calc_avg_dist(size):
    x,y = 3*np.random.random((2,size,2))
    x = x[
        np.logical_not(np.logical_and(
        np.logical_and(x[:,0] > 1, x[:,0] < 2),
        np.logical_and(x[:,1] > 1, x[:,1] < 2)
        ))
    ]
    y = y[
        np.logical_not(np.logical_and(
        np.logical_and(y[:,0] > 1, y[:,0] < 2),
        np.logical_and(y[:,1] > 1, y[:,1] < 2)
        ))
    ]
    n = min(x.shape[0], y.shape[0])
    diffs = x[:n,:] - y[:n,:]
    return np.sum(np.sqrt(np.einsum('ij,ij->i',diffs,diffs)))/n
like image 975
Nick Slavsky Avatar asked May 08 '16 14:05

Nick Slavsky


2 Answers

With the center removed, there are 8 regions that should contain points. These are their lower-left corners:

In [350]: llcorners = np.array([[0, 0], [1, 0], [2, 0], [0, 1], [2, 1], [0, 2], [1, 2], [2, 2]])

The regions are 1x1, so they have the same area and are equally likely to contain a given random point. The following chooses size lower-left corners:

In [351]: corner_indices = np.random.choice(len(llcorners), size=size)

Now generate size (x,y) coordinates in the unit square:

In [352]: unit_coords = np.random.random(size=(size, 2))

Add those to the lower-left corners chosen previously:

In [353]: pts = unit_coords + llcorners[corner_indices]

pts has shape (size, 2). Here's a plot, with size = 2000:

In [363]: plot(pts[:,0], pts[:,1], 'o')
Out[363]: [<matplotlib.lines.Line2D at 0x11000f950>]

plot


Update to address the updated question...

The following function generalizes the above idea to a rectangular shape containing a rectangular hollow. The rectangle is still considered to be nine regions, with the middle region being the hollow. The probability of a random point being in a region is determined by the area of the region; numpy.random.multinomial is used to select the number of points in each region.

(I'm sure there is room for optimization of this code.)

from __future__ import division

import numpy as np


def sample_hollow_lamina(size, outer_width, outer_height, a, b, inner_width, inner_height):
    """
    (a, b) is the lower-left corner of the "hollow".
    """
    llcorners = np.array([[0, 0], [a, 0], [a+inner_width, 0],
                          [0, b], [a+inner_width, b],
                          [0, b+inner_height], [a, b+inner_height], [a+inner_width, b+inner_height]])
    top_height = outer_height - (b + inner_height)
    right_width = outer_width - (a + inner_width)
    widths = np.array([a, inner_width, right_width, a, right_width, a, inner_width, right_width])
    heights = np.array([b, b, b, inner_height, inner_height, top_height, top_height, top_height])
    areas = widths * heights
    shapes = np.column_stack((widths, heights))

    regions = np.random.multinomial(size, areas/areas.sum())
    indices = np.repeat(range(8), regions)
    unit_coords = np.random.random(size=(size, 2))
    pts = unit_coords * shapes[indices] + llcorners[indices]

    return pts

For example,

In [455]: pts = sample_hollow_lamina(2000, 5, 5, 1, 1, 2, 3)

In [456]: plot(pts[:,0], pts[:,1], 'o', alpha=0.75)
Out[456]: [<matplotlib.lines.Line2D at 0x116da0a50>]

In [457]: grid()

plot

Note that the arguments do not have to be integers:

In [465]: pts = sample_hollow_lamina(2000, 3, 3, 0.5, 1.0, 1.5, 0.5)

In [466]: plot(pts[:,0], pts[:,1], 'o', alpha=0.75)
Out[466]: [<matplotlib.lines.Line2D at 0x116e60390>]

In [467]: grid()

plot

like image 110
Warren Weckesser Avatar answered Nov 12 '22 13:11

Warren Weckesser


I've already posted a shorter and somehow unclear answer, here I've taken the time to produce what I think is a better answer.


Generalizing the OP problem, we have a "surface" composed of nsc * nsr squares disposed in nsc columns and nsr rows, and a "hole" composed of nhc * nhr squares (corresponding to the surface squares) disposed in a rectangular arrangement with nhr rows and nhc columns, with the origin placed at ohc, ohr

    +------+------+------+------+------+------+------+------+------+ nsr  
    |      |      |      |      |      |      |      |      |      |  
    |      |      |      |      |      |      |      |      |      |  
 ...+------+------+------+------+------+------+------+------+------+ nsr-1  
    |      |      | oooo | oooo |      |      |      |      |      |  
    |      |      | oooo | oooo |      |      |      |      |      |  
    +------+------+------+------+------+------+------+------+------+ ...  
    |      |      | oooo | oooo |      |      |      |      |      |  
    |      |      | oooo | oooo |      |      |      |      |      |  
    +------+------+------+------+------+------+------+------+------+ ...  
    |      |      | oooo | oooo |      |      |      |      |      |  
    |      |      | oooo | oooo |      |      |      |      |      |  
 ohc+------+------+------+------+------+------+------+------+------+ 1  
    |      |      |      |      |      |      |      |      |      |  
    |      |      |      |      |      |      |      |      |      |  
    +------+------+------+------+------+------+------+------+------+ 0  
    0      1      2     ...                         ...   nsc-1   nsc  
                  |             |  
                ohr=2        ohr+nhr

Our aim is to draw n random points from the surface without the hole (the admissible surface) with a uniform distribution over the admissible surface.

We observe that the admissible area is composed of nsq = nsc*nsr -nhc*nhr equal squares: if we place a random point in an abstract unit square and then assign, with equal probability, that point to one of the squares of the admissible area, we have done our job.

In pseudocode, if random() samples from an uniformly distributed random variable over [0, 1)

(x, y) = (random(), random())
square = integer(random()*nsq)
(x, y) = (x, y) + (column_of_square_origin(square), row_of_square_origin(square))

To speed up the process, we are using numpy and we try to avoid, as far as possible, explicit for loops.

With the names previously used, we need a listing of the origins of the admissible squares, to implement the last line of the pseudo code.

def origins_of_OK_squares(nsc, nsr, ohc, ohr, nhc, nhr):

    # a set of tuples, each one the origin of a square, for ALL the squares
    s_all = {(x, y) for x in range(nsc) for y in range(nsr)}

    # a set of tuples with the origin of the hole squares
    s_hole = {(x, y) for x in range(ohc,ohc+nhc) for y in range(ohr,ohr+nhr)}

    # the set of the origins of admissible squares is the difference
    s_adm = s_all - s_hole

    # return an array with all the origins --- the order is not important!
    # np.array doesn't like  sets
    return np.array(list(s_adm))

We need to generate n random points in the unit square, organized in an array of shape (n,2)

    rand_points = np.random.random((n, 2))

We need an array of n admissible squares

    placements = np.random.randint(0, nsq, n)

We translate each point in rand_points in one of the admissible squares, as specified by the elements of placements.

    rand_points += origins_of_OK_squares(nsc, nsr, ohc, ohr, nhc, nhr)[placements]

taking advantage of the extended addressing that it is possible for numpy arrays, and its done...

In an overly compact function

import numpy as np
def samples_wo_hole(n,  nsc, nsr, ohc, ohr, nhc, nhr):
    s_all = {(x, y) for x in range(nsc) for y in range(nsr)}
    s_hole = {(x, y) for x in range(ohc,ohc+nhc) for y in range(ohr,ohr+nhr)}
    rand_points = np.random.random((n, 2))
    placements = np.random.randint(0, nsc*nsr - nhc*nhr, n)
    return rand_points+np.array(list(s_all-s_hole))[placements]
like image 22
gboffi Avatar answered Nov 12 '22 12:11

gboffi