Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Group Python list of lists into groups based on overlapping items

I have a list of lists and I am trying to group or cluster them based on their items. A nested list starts a new group if none of the elements are in the previous group.

Input:

paths = [  
        ['D', 'B', 'A', 'H'],  
        ['D', 'B', 'A', 'C'],  
        ['H', 'A', 'C'],  
        ['E', 'G', 'I'],  
        ['F', 'G', 'I']]

My failed Code:

paths = [
    ['D', 'B', 'A', 'H'],
    ['D', 'B', 'A', 'C'],
    ['H', 'A', 'C'],
    ['E', 'G', 'I'],
    ['F', 'G', 'I']
]
groups = []
paths_clone = paths
for path in paths:
    for node in path:
        for path_clone in paths_clone:
            if node in path_clone:
                if not path == path_clone:
                    groups.append([path, path_clone])
                else:
                    groups.append(path)
print groups

Expected Output:

[
 [
  ['D', 'B', 'A', 'H'],
  ['D', 'B', 'A', 'C'],
  ['H', 'A', 'C']
 ],
 [
  ['E', 'G', 'I'],
  ['F', 'G', 'I']
 ]
]

Another Example:

paths = [['shifter', 'barrel', 'barrel shifter'],
         ['ARM', 'barrel', 'barrel shifter'],
         ['IP power', 'IP', 'power'],
         ['ARM', 'barrel', 'shifter']]

Expected Output Groups:

output = [
         [['shifter', 'barrel', 'barrel shifter'],
         ['ARM', 'barrel', 'barrel shifter'],
         ['ARM', 'barrel', 'shifter']],
         [['IP power', 'IP', 'power']],
         ]
like image 848
Shankar Avatar asked Jan 13 '23 12:01

Shankar


2 Answers

You are grouping based on sets, so use a set to detect new groups:

def grouper(sequence):
    group, members = [], set()

    for item in sequence:
        if group and members.isdisjoint(item):
            # new group, yield and start new
            yield group
            group, members = [], set()
        group.append(item)
        members.update(item)

    yield group

This gives:

>>> for group in grouper(paths):
...     print group
... 
[['D', 'B', 'A', 'H'], ['D', 'B', 'A', 'C'], ['H', 'A', 'C']]
[['E', 'G', 'I'], ['F', 'G', 'I']]

or you could cast it to a list again:

output = list(grouper(paths))

This assumes that the groups are contiguous. If you have disjoint groups, you need to process the whole list and loop over all groups constructed so far for each item:

def grouper(sequence):
    result = []  # will hold (members, group) tuples

    for item in sequence:
        for members, group in result:
            if members.intersection(item):  # overlap
                members.update(item)
                group.append(item)
                break
        else:  # no group found, add new
            result.append((set(item), [item]))

    return [group for members, group in result]
like image 98
Martijn Pieters Avatar answered Jan 16 '23 02:01

Martijn Pieters


As with most questions in Python that start with "I'm trying to group… by foo…", the answer is "Use itertools.groupby with foo as a key."


First, let's take a very simple grouping criterion: length of each list. For that, the key function is just len. (You may also want to sort first, probably with the same key, depending on your data.)

groups = [list(group) for key, group in itertools.groupby(paths, len)]

Sometimes defining the grouping criterion (and therefore key function) is hard, or impossible, to define in terms of an independent transformation on each element. In those cases, groupby is generally not the answer (although groupby plus another itertools function still might be).

In this case, the most natural way to define the grouping criterion is by comparison with adjacent elements. The easiest way to write that is to write a cmp function that compares two adjacent elements, and then use functools.cmp_to_key on it:

def cmp_paths(lhs, rhs):
    return 0 if any(key in lhs for key in rhs) else -1
key_paths = functools.cmp_to_key(cmp_paths)
groups = [list(group) for key, group in itertools.groupby(paths, key_paths)]
like image 30
abarnert Avatar answered Jan 16 '23 00:01

abarnert