Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Finding the distance between two polygons in numpy

I have two polygons, P and Q, where the exterior linear ring of a polygon is defined by two closed sets of points, stored as numpy arrays, connected in a counterclockwise direction. P and Q are in the following format:

P['x_coords'] = [299398.56 299402.16 299410.25 299419.7  299434.97 299443.75 299454.1 299465.3  299477.   299488.25 299496.8  299499.5  299501.28 299504. 299511.62 299520.62 299527.8  299530.06 299530.06 299525.12 299520.2 299513.88 299508.5  299500.84 299487.34 299474.78 299458.6  299444.66 299429.8  299415.4  299404.84 299399.47 299398.56 299398.56] 
P['y_coords'] = [822975.2  822989.56 823001.25 823005.3  823006.7  823005.06 823001.06 822993.4  822977.2  822961.   822943.94 822933.6  822925.06 822919.7 822916.94 822912.94 822906.6  822897.6  822886.8  822869.75 822860.75 822855.8  822855.4  822857.2  822863.44 822866.6  822870.6  822876.94 822886.8  822903.   822920.3  822937.44 822954.94 822975.2]

Q['x_coords'] = [292316.94 292317.94 292319.44 292322.47 292327.47 292337.72 292345.75 292350.   292352.75 292353.5  292352.25 292348.75 292345.75 292342.5 292338.97 292335.97 292333.22 292331.22 292329.72 292324.72 292319.44 292317.2  292316.2  292316.94]
Q['y_coords'] = [663781.   663788.25 663794.   663798.06 663800.06 663799.3  663796.56 663792.75 663788.5  663782.   663773.25 663766.   663762.   663758.25 663756.5  663756.25 663757.5  663761.   663763.75 663767.5  663769.5 663772.25 663777.5  663781.  ]

## SIMPLIFIED AND FORMATTED FOR EASY TESTING:
import numpy as np

px_coords = np.array([299398,299402,299410.25,299419.7,299398])
py_coords = np.array([822975.2,822920.3,822937.44,822954.94,822975.2])

qx_coords = np.array([292316,292331.22,292329.72,292324.72,292319.44,292317.2,292316])
qy_coords = np.array([663781,663788.25,663794,663798.06,663800.06,663799.3,663781])

The exterior ring of P is formed by joining P['x_coords'][0], P['y_coords'][0] -> P['x_coords'][1], P['y_coords'][1] etc. The last coordinate of each array is the same as the first, indicating that the shape is topologically closed.

Is it possible to calculate a simple minimum distance between the exterior rings of P and Q geometrically using numpy? I have searched high and low on SO without finding anything explicit, so I suspect this may be a drastic oversimplification of a very complex problem. I am aware that distance calculations can be done with out-of-the-box spatial libraries such as GDAL or Shapely, but I'm keen to understand how these work by building something from scratch in numpy.

Some things I have considered or tried:

  • Calculate the distance between each point in both arrays. This doesn't work as the closest point between P and Q can be an edge-vertex pair. Using the convex hull of each shape, calculated using scipy.spatial has the same problem.
  • An inefficient brute force approach calculating the distance between every pair of points, and every combination of edge-point pairs

Is there a better way to go about this problem?

like image 272
bm13563 Avatar asked Nov 06 '22 13:11

bm13563


2 Answers

There are many variations on a k-d tree for storing objects with extent, like the edges of your polygons. The approach with which I am most familiar (but have no link for) involves associating an axis-aligned bounding box with each node; the leaves correspond to the objects, and an internal node’s box is the smallest enclosing both of its children’s (which in general overlap). The usual median-cut approach is applied to the midpoints of the object’s boxes (for line segments, this is their midpoint).

Having built these for each polygon, the following dual recursion finds the closest approach:

def closest(k1,k2,true_dist):
  return _closest(k1,0,k2,0,true_dist,float("inf"))

def _closest(k1,i1,k2,i2,true_dist,lim):
  b1=k1.bbox[i1]
  b2=k2.bbox[i2]
  # Call leaves their own single children:
  cc1=k1.child[i1] or (i1,)
  cc2=k2.child[i2] or (i2,)
  if len(cc1)==1 and len(cc2)==1:
    return min(lim,true_dist(i1,i2))
  # Consider 2 or 4 pairs of children, possibly-closest first:
  for md,c1,c2 in sorted((min_dist(k1.bbox[c1],k2.bbox[c2]),c1,c2)
                         for c1 in cc1 for c2 in cc2):
    if md>=lim: break
    lim=min(lim,_closest(k1,c1,k2,c2,true_dist,lim)
  return lim

Notes:

  • The closest approach true_dist between two non-intersecting line segments must involve at least one endpoint.
  • The distance between a point and a segment can be greater than that between the point and the line containing the segment.
  • No point-point checks are needed: such a pair will be found (four times) via the adjacent edges.
  • The bounding-box arguments to min_dist may be overlapping, in which case it must return 0.
like image 100
Davis Herring Avatar answered Nov 14 '22 21:11

Davis Herring


Thanks to Davis Herring for his answer - his is not the solution I ended up using (because I'm not very familiar with recursion) but I used the principals he outlined to develop a solution. I am planning on building an index into this solution, as suggested by Davis, to help with very large polygons.

I ended using a brute force approach that compares the distance between each edge of both polygons against each other, calculates the distance, and keeps track the minimum distance. I adapted the answers provided in this question: Shortest distance between two line segments. This method is very loop heavy and was running very slowly, so I adapted it to run in cython to improve efficiency.

pure python
shape a edges: 33
shape b edges: 15
total loops: 1000
total time = 6.889256715774536
average time per loop = 0.006896152868643179
max time per loop = 0.022176027297973633
min time per loop = 0.0

cython loop
shape a edges: 33
shape b edges: 15
total loops: 1000
total time = 0.046829938888549805
average time per loop = 4.687681570425406e-05
max time per loop = 0.015621423721313477
min time per loop = 0.0

I have attached the pure python version of the code below for clarity, and can provide the cython one if needed.

import numpy as np
import time
import math


def segments_distance(x11, y11, x12, y12, x21, y21, x22, y22):
    if segments_intersect(x11, y11, x12, y12, x21, y21, x22, y22): return 0
    distances = []
    distances.append(point_segment_distance(x11, y11, x21, y21, x22, y22))
    distances.append(point_segment_distance(x12, y12, x21, y21, x22, y22))
    distances.append(point_segment_distance(x21, y21, x11, y11, x12, y12))
    distances.append(point_segment_distance(x22, y22, x11, y11, x12, y12))
    return min(distances)


def segments_intersect(x11, y11, x12, y12, x21, y21, x22, y22):
    dx1 = x12 - x11
    dy1 = y12 - y11
    dx2 = x22 - x21
    dy2 = y22 - y21
    delta = dx2 * dy1 - dy2 * dx1
    if delta == 0: return False  # parallel segments
    s = (dx1 * (y21 - y11) + dy1 * (x11 - x21)) / delta
    t = (dx2 * (y11 - y21) + dy2 * (x21 - x11)) / (-delta)
    return (0 <= s <= 1) and (0 <= t <= 1)


def point_segment_distance(px, py, x1, y1, x2, y2):
    dx = x2 - x1
    dy = y2 - y1
    if dx == dy == 0:  # the segment's just a point
        return math.hypot(px - x1, py - y1)

    # Calculate the t that minimizes the distance.
    t = ((px - x1) * dx + (py - y1) * dy) / (dx * dx + dy * dy)

    # See if this represents one of the segment's
    # end points or a point in the middle.
    if t < 0:
        dx = px - x1
        dy = py - y1
    elif t > 1:
        dx = px - x2
        dy = py - y2
    else:
        near_x = x1 + t * dx
        near_y = y1 + t * dy
        dx = px - near_x
        dy = py - near_y

    return math.hypot(dx, dy)

px_coords=np.array([299398.56,299402.16,299410.25,299419.7,299434.97,299443.75,299454.1,299465.3,299477.,299488.25,299496.8,299499.5,299501.28,299504.,299511.62,299520.62,299527.8,299530.06,299530.06,299525.12,299520.2,299513.88,299508.5,299500.84,299487.34,299474.78,299458.6,299444.66,299429.8,299415.4,299404.84,299399.47,299398.56,299398.56])
py_coords=np.array([822975.2,822989.56,823001.25,823005.3,823006.7,823005.06,823001.06,822993.4,822977.2,822961.,822943.94,822933.6,822925.06,822919.7,822916.94,822912.94,822906.6,822897.6,822886.8,822869.75,822860.75,822855.8,822855.4,822857.2,822863.44,822866.6,822870.6,822876.94,822886.8,822903.,822920.3,822937.44,822954.94,822975.2])
qx_coords=np.array([384072.1,384073.2,384078.9,384085.7,384092.47,384095.3,384097.12,384097.12,384093.9,384088.9,384082.47,384078.9,384076.03,384074.97,384073.53,384072.1])
qy_coords=np.array([780996.8,781001.1,781003.6,781003.6,780998.25,780993.25,780987.9,780981.8,780977.5,780974.7,780974.7,780977.2,780982.2,780988.25,780992.5,780996.8])

px_edges = np.stack((px_coords, np.roll(px_coords, -1)),1)
py_edges = np.stack((py_coords, np.roll(py_coords, -1)),1)
p_edges = np.stack((px_edges, py_edges), axis=-1)[:-1]

qx_edges = np.stack((qx_coords, np.roll(qx_coords, -1)),1)
qy_edges = np.stack((qy_coords, np.roll(qy_coords, -1)),1)
q_edges = np.stack((qx_edges, qy_edges), axis=-1)[:-1]

timings = []
for i in range(1,1000):
    start = time.time()

    edge_distances = [segments_distance(p_edges[n][0][0],p_edges[n][0][1],p_edges[n][1][0],p_edges[n][1][1],q_edges[m][0][0],q_edges[m][0][1],q_edges[m][1][0],q_edges[m][1][1]) for m in range(0,len(q_edges)) for n in range(0,len(p_edges))]

    end = time.time() - start
    timings.append(end)

print(f'shape a edges: {len(px_coords)}')
print(f'shape b edges: {len(qy_coords)}')
print(f'total loops: {i+1}')
print(f'total time = {sum(timings)}')
print(f'average time per loop = {sum(timings)/len(timings)}')
print(f'max time per loop = {max(timings)}')
print(f'min time per loop = {min(timings)}')
like image 26
bm13563 Avatar answered Nov 14 '22 23:11

bm13563