Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Ordering a list of items in place to compare clustering solutions

I have a small sorting problem. I'm comparing a couple of thousands of clustering solutions to find and count identical solutions. The problem I have is the following:

Given two clustering solutions:

l1 = [1,1,1,2,2,3,3,2,2]
l2 = [2,2,2,1,1,3,3,1,1]

These two solutions are identical, but the group labels are different. How do I sort l2 to make it look like l1? I want to do this to make hashing easier.

like image 819
Misconstruction Avatar asked Nov 24 '25 19:11

Misconstruction


2 Answers

I think I understand your problem, something like the function below would work, I think you'd have a hard time doing this in less than O(n) time.

def normalize_groups(in_list):
    groups = {}
    output = []
    for value in in_list:
        if value not in groups:
            groups[value] = len(groups)
        output.append(groups[value])
    return output

which would return for both lists:

In [52]: normalize_groups(l1)
Out[52]: [0, 0, 0, 1, 1, 2, 2, 1, 1]

In [53]: normalize_groups(l2)
Out[53]: [0, 0, 0, 1, 1, 2, 2, 1, 1]

EDIT: nevermind, just remove the string part completely.

like image 92
Cameron Sparr Avatar answered Nov 26 '25 09:11

Cameron Sparr


You should just be able to traverse L2 and create a mapping to L1 as you go.

mapping = {}
for i, (a, b) in enumerate(zip(l1, l2)):
  if b in mapping:
    l2[i] = mapping[b]
  else:
    mapping[b] = a
    l2[i] = a

 # Now l2 = [1, 1, 1, 2, 2, 3, 3, 2, 2].  Can compare for equality.

This is probably better referred to as label reassignment. It's relatively efficient -- it uses iterators, quick dictionary lookups, and is O(n).

like image 36
Travis D. Avatar answered Nov 26 '25 11:11

Travis D.