Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Constructing a 2D grid from potentially incomplete list of candidates

Tags:

python

numpy

Problem

I need to construct a 2D grid using a set of candidate positions (values in X and Y). However, there may be false positive candidates that should be filtered out, as well as false negatives (where the position needs to be created for the expected position given the surrounding positions' values). The rows and columns of the grid can be expected to be straight, and the rotation, if any small.

Further, I don't have reliable information on where the (0, 0) grid position is. However I do know:

grid_size = (4, 4)

expected_distance = 105

(Excepted distance is just a rough estimate of the spacing between grid points, and should be allowed to vary in the range of 10%).

Example Data

This is the ideal data, with no false positives and no false negatives. The algorithm needs to be able to cope with removing several data-points and adding false ones as well.

X = np.array([61.43283582, 61.56626506, 62.5026738,   65.4028777, 167.03030303, 167.93965517, 170.82191781, 171.37974684, 272.02884615, 272.91089109, 274.1031746, 274.22891566, 378.81553398, 379.39534884, 380.68181818, 382.67164179])

Y = np.array([55.14427861, 160.30120482, 368.80213904, 263.12230216, 55.1030303, 263.64655172, 162.67123288, 371.36708861, 55.59615385, 264.64356436, 368.20634921, 158.37349398, 54.33980583, 160.55813953,  371.72727273,  266.68656716])

Code

The following function evaluates the candidates and returns two dictionaries.

The first one has each candidate position (as a 2-length tuple) as keys and values are 2-length tuples of the positions right and below neighbour (using logic from how images are displayed). Those neighbours are themselves either a 2-length tuple coordinate or a None.

The second dictionary is a reverse lookup of the first, such that each candidate (position) has a list of other candidates' positions supporting it.

import numpy as np
from collections import defaultdict

def get_neighbour_grid(X, Y, expect_dist=(105, 105)):

    t1 = (expect_dist[0] + expect_dist[1]) / 2.0 * 0.9
    t2 = t1 * 1.222

    def neighbours(x, y):

        nRight = None
        ideal = x + expect_dist[0]
        D = np.sqrt((X - ideal)**2 + (Y - y)**2)
        candidate = (X[D.argmin()], Y[D.argmin()])
        if candidate != (x, y) and x + t2 > candidate[0] > x + t1:
            nRight = candidate

        nBelow = None
        ideal = y + expect_dist[0]
        D = np.sqrt((X - x)**2 + (Y - ideal)**2)
        candidate = (X[D.argmin()], Y[D.argmin()])
        if candidate != (x, y) and y + t2 > candidate[1] > y + t1:
            nBelow = candidate

        return nRight, nBelow

    right_below_neighbours = dict()
    def _default_val(*args):
        return list()
    reverse_lookup = defaultdict(_default_val)

    for pos in np.arange(X.size):

        pos_tuple = (X[pos], Y[pos])
        n  = neighbours(*pos_tuple)
        right_below_neighbours[pos_tuple] = n
        reverse_lookup[n[0]].append(pos_tuple)
        reverse_lookup[n[1]].append(pos_tuple)

    return right_below_neighbours, reverse_lookup

Here is where I get stuck:

How do I use these dictionaries and/or X and Y to construct the most supported grid?

I had an idea for starting with the lower, rightmost candidate supported by 2 neighbours and iteratively create the grid using the reverse_lookup dictionary. But that design has several flaws, the most apparent being that I cannot count on having detected the lower, rightmost candidate and both its supporting neighbours.

The code for that, though it wont run since I abandoned it when I realized how problematic it was (pre_grid = right_below_neighbours):

def build_grid(pre_grid, reverse_lookup, grid_shape=(4, 4)):

    def _default_val(*args):
        return 0

    grid_pos_support = defaultdict(_default_val)
    unsupported = 0

    for l, b in pre_grid.values():

        if l is not None:
            grid_pos_support[l] += 1
        else:
            unsupported += 1
        if b is not None:
            grid_pos_support[b] += 1
        else:
            unsupported += 1

    well_supported = list()
    for pos in grid_pos_support:
        if grid_pos_support[pos] >= 2:
            well_supported.append(pos)

    well_A = np.asarray(well_supported)
    ur_pos = well_A[well_A.sum(axis=1).argmax()]

    grid = np.zeros(grid_shape + (2,), dtype=np.float)
    grid[-1,-1,:] = ur_pos

    def _iter_build_grid(pos, ref_pos=None):

        isX = pre_grid[tuple(pos)][0] == ref_pos
        if ref_pos is not None:
            oldCoord = map(lambda x: x[0], np.where(grid == ref_pos)[:-1])
            myCoord = (oldCoord[0] - int(isX), oldCoord[1] - int(not isiX))

        for p in reverse_lookup[tuple(pos)]:

            _iter_build_grid(p, pos)

    _iter_build_grid(ur_pos)

    return grid

The first part could be useful though, since it sums up the support for each position. It also shows what I would need as a final output (grid):

A 3D array with the 2 first dimensions the shape of the grid and the 3rd with length 2 (for x-coordinate and y-coordinate for each position).

Recap

So I realize how my attempt was useless, but I'm at loss as to how make a global evaluation of all candidates and place the most supported grid using the candidates' x and y values wherever fit. As this is, I expect, a quite complex question, I don't really expect anyone to give a complete solution (though it would be great), but any hint as to what type of algorithms or numpy/scipy functions could be used would be much appreciated.

Finally, sorry for this being a somewhat lengthy question.

Edit

Drawing of what I want to happen:

Sketch of how it should work

The stars/dots are the X and Y plotted with two modifications, I removed the first position and added a false one to make this a full example of the sought algorithm.

What I want is to, in other words, map the red-circled positions' new coordinate values (the ones written beside them) so that I can obtain the old coordinate from the new (e.g. (1, 1) -> (170.82191781, 162.67123288)). I also want points that don't approximate the ideal grid that the true points describe to be discarded (as shown), and finally the empty ideal grid positions (blue circle) to be 'filled' using the ideal grid parameters (roughly (0, 0) -> (55, 55)).

Solution

I used the code @skymandr supplied to get the ideal parameters and then did the following (not the prettiest code, but it works). That means I'm not using the get_neighbour_grid-function anymore.:

def build_grid(X, Y, x_offset, y_offset, dx, dy, grid_shape=(16,24),
    square_distance_threshold=None):

    if square_distance_threshold is None:
        square_distance_threshold = ((dx + dy) / 2.0 * 0.05) ** 2

    grid = np.zeros(grid_shape + (2,), dtype=np.float)

    D = np.zeros(grid_shape)
    for i in range(grid_shape[0]):
        for j in range(grid_shape[1]):
            D[i,j] = i * (1 + 1.0 / (grid_shape[0] + 1)) + j

    rD = D.ravel().copy()
    rD.sort()

    def find_valid(x, y):

        d = (X - x) ** 2 + (Y - y) ** 2
        valid = d < square_distance_threshold
        if valid.any():
            pos = d == d[valid].min()
            if pos.sum() == 1:
                return X[pos], Y[pos]

        return x, y

    x = x_offset
    y = y_offset
    first_loop = True

    for v in rD:
        #get new position
        coord = np.where(D == v)

        #generate a reference position already passed
        if coord[0][0] > 0:
            old_coord = (coord[0] - 1, coord[1])
        elif coord[1][0] > 0:
            old_coord = (coord[0], coord[1] - 1)

        if not first_loop:
            #calculate ideal step
            x, y = grid[old_coord].ravel()
            x += (coord[0] - old_coord[0]) * dx
            y += (coord[1] - old_coord[1]) * dy

        #modify with observed point close to ideal if exists
        x, y = find_valid(x, y)

        #put in grid
        #print coord, grid[coord].shape
        grid[coord] = np.array((x, y)).reshape(grid[coord].shape)

        first_loop = False


    return grid

It poses another question: how to nicely iterate along the diagonals of an 2D-array, but I suppose that is worthy of a question of its own: More numpy way of iterating through the 'orthogonal' diagonals of a 2D array

Edit

Updated the solution code to better deal with larger grid-sizes so that it uses a neighbouring grid position already passed as reference for the ideal coordinate for all positions. Still have to find a way to implement the better way of iterating through the grid from the linked question.

like image 802
deinonychusaur Avatar asked Jan 09 '13 17:01

deinonychusaur


1 Answers

Here is a fairly simple and cheap solution, though I don't know how robust it is.

First of all, here's a way of getting a better estimate for the spacing:

leeway = 1.10

XX = X.reshape((1, X.size))
dX = np.abs(XX - XX.T).reshape((1, X.size ** 2))
dxs = dX[np.where(np.logical_and(dX > expected_distance / leeway,
                                 dX < expected_distance * leeway))]
dx = dxs.mean()

YY = Y.reshape((1, Y.size))
dY = np.abs(YY - YY.T).reshape((1, Y.size ** 2))
dys = dY[np.where(np.logical_and(dY > expected_distance / leeway,
                                 dY < expected_distance * leeway))]
dy = dys.mean()

The code computes internal differences in X and Y, and takes the mean of those who are within 10% of the desired spacing.

For the second part, finding the offset of the grid, a similar method can be used:

Ndx = np.array([np.arange(grid_size[0])]) * dx
x_offsets = XX - Ndx.T
x_offset = np.median(x_offsets)

Ndy = np.array([np.arange(grid_size[1])]) * dy
y_offsets = YY - Ndy.T
y_offset = np.median(y_offsets)

Essentially, what this does is to let each position in X "vote" for NX = grid_size[0] positions where the bottom left point might be, based on X - n * dx where n = 0 is a vote for the point itself, n = 1 is a vote for a point one dx to the left etc. This way, the points near the true origin will get the most votes, and the offset can be found using the median.

I think this method is sufficiently symmetric around the desired origin, that the median can be used in most (if not all) cases. If, however, there are many false positives, that make the median not work for some reason, the "true" origin can be found using e.g. a histogram-method.

like image 102
skymandr Avatar answered Sep 23 '22 14:09

skymandr