Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

python: how to merge a list into clusters?

I have a list of tuples:

[(3,4), (18,27), (4,14)]

and need a code merging tuples which has repeated numbers, making another list where all list elements will only contain unique numbers. The list should be sorted by the length of the tuples, i.e.:

>>> MergeThat([(3,4), (18,27), (4,14)])
[(3,4,14), (18,27)]

>>> MergeThat([(1,3), (15,21), (1,10), (57,66), (76,85), (66,76)])
[(57,66,76,85), (1,3,10), (15,21)]

I understand it's something similar to hierarchical clustering algorithms, which I've read about, but can't figure them out.

Is there a relatively simple code for a MergeThat() function?

like image 393
user63503 Avatar asked Dec 12 '22 19:12

user63503


2 Answers

I tried hard to figure this out, but only after I tried the approach Ian's answer (thanks!) suggested I realized what the theoretical problem is: The input is a list of edges and defines a graph. We are looking for the strongly connected components of this graph. It's simple as that.

While you can do this efficiently, there is actually no reason to implement this yourself! Just import a good graph library:

import networkx as nx

# one of your examples
g1 = nx.Graph([(1,3), (15,21), (1,10), (57,66), (76,85), (66,76)])
print nx.connected_components(g1) # [[57, 66, 76, 85], [1, 10, 3], [21, 15]]

# my own test case
g2 =  nx.Graph([(1,2),(2,10), (20,3), (3,4), (4,10)])
print nx.connected_components(g2) # [[1, 2, 3, 4, 10, 20]]
like image 146
Jochen Ritzel Avatar answered Dec 15 '22 07:12

Jochen Ritzel


import itertools

def merge_it(lot):
    merged = [ set(x) for x in lot ] # operate on sets only
    finished = False
    while not finished:
        finished = True
        for a, b in itertools.combinations(merged, 2):
            if a & b:
                # we merged in this iteration, we may have to do one more
                finished = False
                if a in merged: merged.remove(a)
                if b in merged: merged.remove(b)    
                merged.append(a.union(b))
                break # don't inflate 'merged' with intermediate results
    return merged

if __name__ == '__main__':
    print merge_it( [(3,4), (18,27), (4,14)] )
    # => [set([18, 27]), set([3, 4, 14])]

    print merge_it( [(1,3), (15,21), (1,10), (57,66), (76,85), (66,76)] )
    # => [set([21, 15]), set([1, 10, 3]), set([57, 66, 76, 85])]

    print merge_it( [(1,2), (2,3), (3,4), (4,5), (5,9)] )
    # => [set([1, 2, 3, 4, 5, 9])]

Here's a snippet (including doctests): http://gist.github.com/586252

like image 29
miku Avatar answered Dec 15 '22 08:12

miku