Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sort a list of two-sided items based on the similarity of consecutive items

I'm looking for some kind of "Domino sort" algorithm that sorts a list of two-sided items based on the similarity of "tangent" sides of subsequent items.

Suppose the following list where items are represented by 2-tuples:

>>> items
[(0.72, 0.12),
 (0.11, 0.67),
 (0.74, 0.65),
 (0.32, 0.52),
 (0.82, 0.43),
 (0.94, 0.64),
 (0.39, 0.95),
 (0.01, 0.72),
 (0.49, 0.41),
 (0.27, 0.60)]

The goal is to sort that list such that the sum of squared differences of the tangent ends of each two subsequent items (the loss) is minimal:

>>> loss = sum(
...     (items[i][1] - items[i+1][0])**2
...     for i in range(len(items)-1)
... )

For the above example this can be computed by just working through all possible permutations but for lists with more items this becomes quickly unfeasible (O(n!)).

The approach of selecting the best match step-by-step as sketched here

def compute_loss(items):
    return sum((items[i][1] - items[i+1][0])**2 for i in range(len(items)-1))


def domino_sort(items):
    best_attempt = items
    best_score = compute_loss(best_attempt)
    for i in range(len(items)):
        copy = [x for x in items]
        attempt = [copy.pop(i)]
        for j in range(len(copy)):
            copy = sorted(copy, key=lambda x: abs(x[0] - attempt[-1][1]))
            attempt.append(copy.pop(0))
        score = compute_loss(attempt)
        if score < best_score:
            best_attempt = attempt
            best_score = score
    return best_attempt, best_score

gives the following result with a loss of 0.1381:

[(0.01, 0.72),
 (0.72, 0.12),
 (0.11, 0.67),
 (0.74, 0.65),
 (0.49, 0.41),
 (0.39, 0.95),
 (0.94, 0.64),
 (0.82, 0.43),
 (0.32, 0.52),
 (0.27, 0.6)]

This is however not the best solution which would be

[(0.01, 0.72),
 (0.82, 0.43),
 (0.27, 0.6),
 (0.49, 0.41),
 (0.32, 0.52),
 (0.39, 0.95),
 (0.94, 0.64),
 (0.72, 0.12),
 (0.11, 0.67),
 (0.74, 0.65)]

with a loss of 0.0842. Obviously the above algorithm performs well for the first few items however the differences for the last ones grow so large that they dominate the loss.

Is there any algorithm that can perform this kind of sort in acceptable time dependence (feasible for lists of hundreds of items)?

If it's not possible to do this kind of sort exactly in less than O(n!) are there any approximate approaches that are likely to return a good score (small loss)?

like image 325
a_guest Avatar asked Feb 28 '18 19:02

a_guest


2 Answers

In general, this problem is about finding a Hamiltonian path with a minimum length that is closely related to famous Travelling salesman problem (TSP). And it does not look like a special case of this problem that can be solved in polynomial time.

There is a huge amount of heuristics and approximate algorithms for solving TSP. This wikipedia article could be a good place to start.

like image 151
DAle Avatar answered Nov 02 '22 15:11

DAle


Slightly more efficient version of the naive approach using bisect.
(implementation kudos: https://stackoverflow.com/a/12141511/6163736)

# Domino Packing
from bisect import bisect_left
from pprint import pprint


def compute_loss(items):
    return sum((items[i][1] - items[i+1][0])**2 for i in range(len(items)-1))


def find_nearest(values, target):
    """
    Assumes values is sorted. Returns closest value to target.
    If two numbers are equally close, return the smallest number.
    """
    idx = bisect_left(values, target)
    if idx == 0:
        return 0
    if idx == len(values):
        return -1
    before = values[idx - 1]
    after = values[idx]
    if after - target < target - before:
        return idx      # after
    else:
        return idx - 1  # before


if __name__ == '__main__':

    dominos = [(0.72, 0.12),
               (0.11, 0.67),
               (0.74, 0.65),
               (0.32, 0.52),
               (0.82, 0.43),
               (0.94, 0.64),
               (0.39, 0.95),
               (0.01, 0.72),
               (0.49, 0.41),
               (0.27, 0.60)]

    dominos = sorted(dominos, key=lambda x: x[0])
    x_values, y_values = [list(l) for l in zip(*dominos)]
    packed = list()
    idx = 0

    for _ in range(len(dominos)):
        x = x_values[idx]
        y = y_values[idx]
        del x_values[idx]
        del y_values[idx]

        idx = find_nearest(x_values, y)
        packed.append((x, y))

    pprint(packed)
    print("loss :%f" % compute_loss(packed))

output:

[(0.01, 0.72),
 (0.72, 0.12),
 (0.11, 0.67),
 (0.74, 0.65),
 (0.49, 0.41),
 (0.39, 0.95),
 (0.94, 0.64),
 (0.82, 0.43),
 (0.32, 0.52),
 (0.27, 0.6)]
loss :0.138100
like image 42
stacksonstacks Avatar answered Nov 02 '22 14:11

stacksonstacks