I have a small sorting problem. I'm comparing a couple of thousands of clustering solutions to find and count identical solutions. The problem I have is the following:
Given two clustering solutions:
l1 = [1,1,1,2,2,3,3,2,2]
l2 = [2,2,2,1,1,3,3,1,1]
These two solutions are identical, but the group labels are different. How do I sort l2 to make it look like l1? I want to do this to make hashing easier.
I think I understand your problem, something like the function below would work, I think you'd have a hard time doing this in less than O(n) time.
def normalize_groups(in_list):
groups = {}
output = []
for value in in_list:
if value not in groups:
groups[value] = len(groups)
output.append(groups[value])
return output
which would return for both lists:
In [52]: normalize_groups(l1)
Out[52]: [0, 0, 0, 1, 1, 2, 2, 1, 1]
In [53]: normalize_groups(l2)
Out[53]: [0, 0, 0, 1, 1, 2, 2, 1, 1]
EDIT: nevermind, just remove the string part completely.
You should just be able to traverse L2 and create a mapping to L1 as you go.
mapping = {}
for i, (a, b) in enumerate(zip(l1, l2)):
if b in mapping:
l2[i] = mapping[b]
else:
mapping[b] = a
l2[i] = a
# Now l2 = [1, 1, 1, 2, 2, 3, 3, 2, 2]. Can compare for equality.
This is probably better referred to as label reassignment. It's relatively efficient -- it uses iterators, quick dictionary lookups, and is O(n).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With