Problem
I need to construct a 2D grid using a set of candidate positions (values in X
and Y
). However, there may be false positive candidates that should be filtered out, as well as false negatives (where the position needs to be created for the expected position given the surrounding positions' values). The rows and columns of the grid can be expected to be straight, and the rotation, if any small.
Further, I don't have reliable information on where the (0, 0) grid position is. However I do know:
grid_size = (4, 4)
expected_distance = 105
(Excepted distance is just a rough estimate of the spacing between grid points, and should be allowed to vary in the range of 10%).
Example Data
This is the ideal data, with no false positives and no false negatives. The algorithm needs to be able to cope with removing several data-points and adding false ones as well.
X = np.array([61.43283582, 61.56626506, 62.5026738, 65.4028777, 167.03030303, 167.93965517, 170.82191781, 171.37974684, 272.02884615, 272.91089109, 274.1031746, 274.22891566, 378.81553398, 379.39534884, 380.68181818, 382.67164179])
Y = np.array([55.14427861, 160.30120482, 368.80213904, 263.12230216, 55.1030303, 263.64655172, 162.67123288, 371.36708861, 55.59615385, 264.64356436, 368.20634921, 158.37349398, 54.33980583, 160.55813953, 371.72727273, 266.68656716])
Code
The following function evaluates the candidates and returns two dictionaries.
The first one has each candidate position (as a 2-length tuple) as keys and values are 2-length tuples of the positions right and below neighbour (using logic from how images are displayed). Those neighbours are themselves either a 2-length tuple coordinate or a None
.
The second dictionary is a reverse lookup of the first, such that each candidate (position) has a list of other candidates' positions supporting it.
import numpy as np
from collections import defaultdict
def get_neighbour_grid(X, Y, expect_dist=(105, 105)):
t1 = (expect_dist[0] + expect_dist[1]) / 2.0 * 0.9
t2 = t1 * 1.222
def neighbours(x, y):
nRight = None
ideal = x + expect_dist[0]
D = np.sqrt((X - ideal)**2 + (Y - y)**2)
candidate = (X[D.argmin()], Y[D.argmin()])
if candidate != (x, y) and x + t2 > candidate[0] > x + t1:
nRight = candidate
nBelow = None
ideal = y + expect_dist[0]
D = np.sqrt((X - x)**2 + (Y - ideal)**2)
candidate = (X[D.argmin()], Y[D.argmin()])
if candidate != (x, y) and y + t2 > candidate[1] > y + t1:
nBelow = candidate
return nRight, nBelow
right_below_neighbours = dict()
def _default_val(*args):
return list()
reverse_lookup = defaultdict(_default_val)
for pos in np.arange(X.size):
pos_tuple = (X[pos], Y[pos])
n = neighbours(*pos_tuple)
right_below_neighbours[pos_tuple] = n
reverse_lookup[n[0]].append(pos_tuple)
reverse_lookup[n[1]].append(pos_tuple)
return right_below_neighbours, reverse_lookup
Here is where I get stuck:
How do I use these dictionaries and/or X
and Y
to construct the most supported grid?
I had an idea for starting with the lower, rightmost candidate supported by 2 neighbours and iteratively create the grid using the reverse_lookup
dictionary. But that design has several flaws, the most apparent being that I cannot count on having detected the lower, rightmost candidate and both its supporting neighbours.
The code for that, though it wont run since I abandoned it when I realized how problematic it was (pre_grid = right_below_neighbours
):
def build_grid(pre_grid, reverse_lookup, grid_shape=(4, 4)):
def _default_val(*args):
return 0
grid_pos_support = defaultdict(_default_val)
unsupported = 0
for l, b in pre_grid.values():
if l is not None:
grid_pos_support[l] += 1
else:
unsupported += 1
if b is not None:
grid_pos_support[b] += 1
else:
unsupported += 1
well_supported = list()
for pos in grid_pos_support:
if grid_pos_support[pos] >= 2:
well_supported.append(pos)
well_A = np.asarray(well_supported)
ur_pos = well_A[well_A.sum(axis=1).argmax()]
grid = np.zeros(grid_shape + (2,), dtype=np.float)
grid[-1,-1,:] = ur_pos
def _iter_build_grid(pos, ref_pos=None):
isX = pre_grid[tuple(pos)][0] == ref_pos
if ref_pos is not None:
oldCoord = map(lambda x: x[0], np.where(grid == ref_pos)[:-1])
myCoord = (oldCoord[0] - int(isX), oldCoord[1] - int(not isiX))
for p in reverse_lookup[tuple(pos)]:
_iter_build_grid(p, pos)
_iter_build_grid(ur_pos)
return grid
The first part could be useful though, since it sums up the support for each position. It also shows what I would need as a final output (grid
):
A 3D array with the 2 first dimensions the shape of the grid and the 3rd with length 2 (for x-coordinate and y-coordinate for each position).
Recap
So I realize how my attempt was useless, but I'm at loss as to how make a global evaluation of all candidates and place the most supported grid using the candidates' x and y values wherever fit. As this is, I expect, a quite complex question, I don't really expect anyone to give a complete solution (though it would be great), but any hint as to what type of algorithms or numpy/scipy functions could be used would be much appreciated.
Finally, sorry for this being a somewhat lengthy question.
Edit
Drawing of what I want to happen:
The stars/dots are the X
and Y
plotted with two modifications, I removed the first position and added a false one to make this a full example of the sought algorithm.
What I want is to, in other words, map the red-circled positions' new coordinate values (the ones written beside them) so that I can obtain the old coordinate from the new (e.g. (1, 1) -> (170.82191781, 162.67123288)
). I also want points that don't approximate the ideal grid that the true points describe to be discarded (as shown), and finally the empty ideal grid positions (blue circle) to be 'filled' using the ideal grid parameters (roughly (0, 0) -> (55, 55)
).
Solution
I used the code @skymandr supplied to get the ideal parameters and then did the following (not the prettiest code, but it works). That means I'm not using the get_neighbour_grid
-function anymore.:
def build_grid(X, Y, x_offset, y_offset, dx, dy, grid_shape=(16,24),
square_distance_threshold=None):
if square_distance_threshold is None:
square_distance_threshold = ((dx + dy) / 2.0 * 0.05) ** 2
grid = np.zeros(grid_shape + (2,), dtype=np.float)
D = np.zeros(grid_shape)
for i in range(grid_shape[0]):
for j in range(grid_shape[1]):
D[i,j] = i * (1 + 1.0 / (grid_shape[0] + 1)) + j
rD = D.ravel().copy()
rD.sort()
def find_valid(x, y):
d = (X - x) ** 2 + (Y - y) ** 2
valid = d < square_distance_threshold
if valid.any():
pos = d == d[valid].min()
if pos.sum() == 1:
return X[pos], Y[pos]
return x, y
x = x_offset
y = y_offset
first_loop = True
for v in rD:
#get new position
coord = np.where(D == v)
#generate a reference position already passed
if coord[0][0] > 0:
old_coord = (coord[0] - 1, coord[1])
elif coord[1][0] > 0:
old_coord = (coord[0], coord[1] - 1)
if not first_loop:
#calculate ideal step
x, y = grid[old_coord].ravel()
x += (coord[0] - old_coord[0]) * dx
y += (coord[1] - old_coord[1]) * dy
#modify with observed point close to ideal if exists
x, y = find_valid(x, y)
#put in grid
#print coord, grid[coord].shape
grid[coord] = np.array((x, y)).reshape(grid[coord].shape)
first_loop = False
return grid
It poses another question: how to nicely iterate along the diagonals of an 2D-array, but I suppose that is worthy of a question of its own: More numpy way of iterating through the 'orthogonal' diagonals of a 2D array
Edit
Updated the solution code to better deal with larger grid-sizes so that it uses a neighbouring grid position already passed as reference for the ideal coordinate for all positions. Still have to find a way to implement the better way of iterating through the grid from the linked question.
Here is a fairly simple and cheap solution, though I don't know how robust it is.
First of all, here's a way of getting a better estimate for the spacing:
leeway = 1.10
XX = X.reshape((1, X.size))
dX = np.abs(XX - XX.T).reshape((1, X.size ** 2))
dxs = dX[np.where(np.logical_and(dX > expected_distance / leeway,
dX < expected_distance * leeway))]
dx = dxs.mean()
YY = Y.reshape((1, Y.size))
dY = np.abs(YY - YY.T).reshape((1, Y.size ** 2))
dys = dY[np.where(np.logical_and(dY > expected_distance / leeway,
dY < expected_distance * leeway))]
dy = dys.mean()
The code computes internal differences in X and Y, and takes the mean of those who are within 10% of the desired spacing.
For the second part, finding the offset of the grid, a similar method can be used:
Ndx = np.array([np.arange(grid_size[0])]) * dx
x_offsets = XX - Ndx.T
x_offset = np.median(x_offsets)
Ndy = np.array([np.arange(grid_size[1])]) * dy
y_offsets = YY - Ndy.T
y_offset = np.median(y_offsets)
Essentially, what this does is to let each position in X
"vote" for NX = grid_size[0]
positions where the bottom left point might be, based on X - n * dx
where n = 0
is a vote for the point itself, n = 1
is a vote for a point one dx
to the left etc. This way, the points near the true origin will get the most votes, and the offset can be found using the median.
I think this method is sufficiently symmetric around the desired origin, that the median can be used in most (if not all) cases. If, however, there are many false positives, that make the median not work for some reason, the "true" origin can be found using e.g. a histogram-method.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With