I'm looking for some kind of "Domino sort" algorithm that sorts a list of two-sided items based on the similarity of "tangent" sides of subsequent items.
Suppose the following list where items are represented by 2-tuples:
>>> items
[(0.72, 0.12),
(0.11, 0.67),
(0.74, 0.65),
(0.32, 0.52),
(0.82, 0.43),
(0.94, 0.64),
(0.39, 0.95),
(0.01, 0.72),
(0.49, 0.41),
(0.27, 0.60)]
The goal is to sort that list such that the sum of squared differences of the tangent ends of each two subsequent items (the loss) is minimal:
>>> loss = sum(
... (items[i][1] - items[i+1][0])**2
... for i in range(len(items)-1)
... )
For the above example this can be computed by just working through all possible permutations but for lists with more items this becomes quickly unfeasible (O(n!)
).
The approach of selecting the best match step-by-step as sketched here
def compute_loss(items):
return sum((items[i][1] - items[i+1][0])**2 for i in range(len(items)-1))
def domino_sort(items):
best_attempt = items
best_score = compute_loss(best_attempt)
for i in range(len(items)):
copy = [x for x in items]
attempt = [copy.pop(i)]
for j in range(len(copy)):
copy = sorted(copy, key=lambda x: abs(x[0] - attempt[-1][1]))
attempt.append(copy.pop(0))
score = compute_loss(attempt)
if score < best_score:
best_attempt = attempt
best_score = score
return best_attempt, best_score
gives the following result with a loss of 0.1381
:
[(0.01, 0.72),
(0.72, 0.12),
(0.11, 0.67),
(0.74, 0.65),
(0.49, 0.41),
(0.39, 0.95),
(0.94, 0.64),
(0.82, 0.43),
(0.32, 0.52),
(0.27, 0.6)]
This is however not the best solution which would be
[(0.01, 0.72),
(0.82, 0.43),
(0.27, 0.6),
(0.49, 0.41),
(0.32, 0.52),
(0.39, 0.95),
(0.94, 0.64),
(0.72, 0.12),
(0.11, 0.67),
(0.74, 0.65)]
with a loss of 0.0842
. Obviously the above algorithm performs well for the first few items however the differences for the last ones grow so large that they dominate the loss.
Is there any algorithm that can perform this kind of sort in acceptable time dependence (feasible for lists of hundreds of items)?
If it's not possible to do this kind of sort exactly in less than O(n!)
are there any approximate approaches that are likely to return a good score (small loss)?
In general, this problem is about finding a Hamiltonian path with a minimum length that is closely related to famous Travelling salesman problem (TSP). And it does not look like a special case of this problem that can be solved in polynomial time.
There is a huge amount of heuristics and approximate algorithms for solving TSP. This wikipedia article could be a good place to start.
Slightly more efficient version of the naive approach using bisect.
(implementation kudos: https://stackoverflow.com/a/12141511/6163736)
# Domino Packing
from bisect import bisect_left
from pprint import pprint
def compute_loss(items):
return sum((items[i][1] - items[i+1][0])**2 for i in range(len(items)-1))
def find_nearest(values, target):
"""
Assumes values is sorted. Returns closest value to target.
If two numbers are equally close, return the smallest number.
"""
idx = bisect_left(values, target)
if idx == 0:
return 0
if idx == len(values):
return -1
before = values[idx - 1]
after = values[idx]
if after - target < target - before:
return idx # after
else:
return idx - 1 # before
if __name__ == '__main__':
dominos = [(0.72, 0.12),
(0.11, 0.67),
(0.74, 0.65),
(0.32, 0.52),
(0.82, 0.43),
(0.94, 0.64),
(0.39, 0.95),
(0.01, 0.72),
(0.49, 0.41),
(0.27, 0.60)]
dominos = sorted(dominos, key=lambda x: x[0])
x_values, y_values = [list(l) for l in zip(*dominos)]
packed = list()
idx = 0
for _ in range(len(dominos)):
x = x_values[idx]
y = y_values[idx]
del x_values[idx]
del y_values[idx]
idx = find_nearest(x_values, y)
packed.append((x, y))
pprint(packed)
print("loss :%f" % compute_loss(packed))
output:
[(0.01, 0.72),
(0.72, 0.12),
(0.11, 0.67),
(0.74, 0.65),
(0.49, 0.41),
(0.39, 0.95),
(0.94, 0.64),
(0.82, 0.43),
(0.32, 0.52),
(0.27, 0.6)]
loss :0.138100
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With