Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Ordering an array for maximal pairwise matches

I have an array:

array([[ 4, 10],
       [ 4,  2],
       [ 0,  7],
       [ 5, 11],
       [ 6,  8],
       [ 3,  6],
       [ 9,  7],
       [ 2, 11],
       [ 9,  5],
       [ 8,  1]])

I want a method by which to order the value pairs so that as many as possible pairwise 2-element sets have a common value. This is an example of the desired ordered array:

array([[ 4, 10],
       [ 4,  2],
       [ 2, 11],
       [ 5, 11],
       [ 9,  5],
       [ 9,  7],
       [ 0,  7],  #note the gap here:
       [ 8,  1],
       [ 6,  8],
       [ 3,  6]])

Several conditions are true about these arrays. There are no duplicate pairs (ie: [1,0] nor [0,1] will appear elsewhere in the array if [0,1] already exists). No pair has the same value (ie: [1,1] will not exist). No pair will have more than two matches (iow: no value exists more than twice in the entire array.) but a pair can have as few as zero matches (note in the above array the gap across which there is no match).

Obviously, I can create every permutation of the array, but this seems brutish. I think there might be some way of cutting the deck and re-stacking in a logical way such that it becomes sorted in a small number of cuts. But before I go down that road, I want to: 1) Be sure that there is no numpy or collections function that already does this. 2) Know that there is no tricky-genius way to use numpy .sort() (or similar) to do it. 3) Find whether this is a common task and there is algorithm that does this. ("Oh, that's the Blumen-Funke algorithm!")

Here is some code that generates shuffled test arrays and checks sorted arrays:

def shuffled(N=12, ans=1):
    '''returns is now the unsorted test array'''
    r = range(N)
    random.shuffle(r)
    l = []
    for i in range(N):
        l.append((r[i-1], r[i]))
    random.shuffle(l)
    return np.array(l)[ans+1:]

# step 2: ???

def test_ordered(a):
    '''checks if generated array has been sorted'''
    c0 = a[1:,0]==a[:-1,0]
    c1 = a[1:,0]==a[:-1,1]
    c2 = a[1:,1]==a[:-1,0]
    c3 = a[1:,1]==a[:-1,1]
    cond = c0+c1+c2+c3
    ans = sum(numpy.logical_not(cond))
    # when sorted this should return the same number input into
    # shuffled() as 'ans':
    return ans

(Is this a subjective question? I'm being warned that it is.)

Results:

Thank you so much for all your help. Sven's solution was about 20% faster than Paul's and happily, they both run in linear time, Doug's answer did not solve the problem. There was a high, but also largely linear dependence of performance on the number of "breaks" or "gaps" in the input data. See the plot below. The 10,000 magnitude axis is N. the 0.5 axis is the percentage of breaks. the z axis is performance in seconds.

Thanks again!

enter image description here

like image 877
Paul Avatar asked May 17 '26 03:05

Paul


1 Answers

You've described a graph where the vertices are numbers, and the edges are your pairs.

Your conditions specify that each number appears once or twice in the list. This means that connected components in your graph are lines (or cycles). You can find them using this algorithm:

  • [Line exists] If possible, pick a number with degree 1 (that is, it only appears once in the list). Follow the chain of pairs as far as you can, adding them to the output and removing the traversed vertices from your graph.
  • [Cycle exists] If there was no number with degree 1 it means all the components are cycles. Pick any vertex (it will have degree 2). Follow the pairs as before, adding them to the output and removing the traversed vertices, but this time stop when you reach your original number.
  • Repeat, until you've used up all the vertices in the graph.

You can run this algorithm efficiently: maintain a set of vertices of degree 1 and another of degree 2. When you consume an edge (a pair in your original list), modify the sets appropriately: remove an endpoint of degree 1 from the first set, and move an endpoint from the degree 2 set to the degree 1 set. Alternatively, use a priority queue.

You'll also need an efficient lookup for your pairs: build a dict from vertex to a list of adjacent vertices.

Using these ideas, you can find a best ordering in linear time (assuming O(1) set and dict implementations).

Here's a somewhat clunky implementation.

import collections

def handle_vertex(v, vs):
  if v in vs[0]:
    vs[0].remove(v)
  elif v in vs[1]:
    vs[0].add(v)
    vs[1].remove(v)

def follow(edgemap, start, vs):
  """Follow a path in the graph, yielding the edges."""
  v0 = start
  last = None
  while True:
    # All vertices that we can go to next.
    next_v = [v for v in edgemap[v0] if v != last]
    if not next_v:
      # We've reached the end (we must have been a line).
      return
    assert len(next_v) == 1 or (v0 == start and len(next_v) == 2)
    next_v = next_v[0]
    # Remove the endpoints from the vertex-degree sets.
    handle_vertex(v0, vs)
    handle_vertex(next_v, vs)
    yield v0, next_v
    if next_v == start:
      # We've got back to the start (we must have been a cycle).
      return
    v0, last = next_v, v0

def pairsort(edges):
  edgemap = collections.defaultdict(list)
  original_edges = {}
  for a, b in edges:
    # Build the adjacency table.
    edgemap[a].append(b)
    edgemap[b].append(a)
    # Keep a map to remember the original order pairs appeared in
    # so we can output edges correctly no matter which way round
    # we store them.
    original_edges[a, b] = [a, b]
    original_edges[b, a] = [a, b]
  # Build sets of degree 1 and degree 2 vertices.
  vs = [set(), set()]
  for k, v in edgemap.iteritems():
    vs[len(v) - 1].add(k)
  # Find all components that are lines.
  while vs[0]:
    v0 = vs[0].pop()
    for e in follow(edgemap, v0, vs):
      yield original_edges[e]
  # Find all components that are cycles.
  while vs[1]:
    v0 = vs[1].pop()
    for e in follow(edgemap, v0, vs):
      yield original_edges[e]

input = [
    [ 4, 10],
    [ 4,  2],
    [ 0,  7],
    [ 5, 11],
    [ 6,  8],
    [ 3,  6],
    [ 9,  7],
    [ 2, 11],
    [ 9,  5],
    [ 8,  1]]

print list(pairsort(input))