Group Python list of lists into groups based on overlapping items

Question

I have a list of lists and I am trying to group or cluster them based on their items. A nested list starts a new group if none of the elements are in the previous group.

Input:

paths = [  
        ['D', 'B', 'A', 'H'],  
        ['D', 'B', 'A', 'C'],  
        ['H', 'A', 'C'],  
        ['E', 'G', 'I'],  
        ['F', 'G', 'I']]

My failed Code:

paths = [
    ['D', 'B', 'A', 'H'],
    ['D', 'B', 'A', 'C'],
    ['H', 'A', 'C'],
    ['E', 'G', 'I'],
    ['F', 'G', 'I']
]
groups = []
paths_clone = paths
for path in paths:
    for node in path:
        for path_clone in paths_clone:
            if node in path_clone:
                if not path == path_clone:
                    groups.append([path, path_clone])
                else:
                    groups.append(path)
print groups

Expected Output:

[
 [
  ['D', 'B', 'A', 'H'],
  ['D', 'B', 'A', 'C'],
  ['H', 'A', 'C']
 ],
 [
  ['E', 'G', 'I'],
  ['F', 'G', 'I']
 ]
]

Another Example:

paths = [['shifter', 'barrel', 'barrel shifter'],
         ['ARM', 'barrel', 'barrel shifter'],
         ['IP power', 'IP', 'power'],
         ['ARM', 'barrel', 'shifter']]

Expected Output Groups:

output = [
         [['shifter', 'barrel', 'barrel shifter'],
         ['ARM', 'barrel', 'barrel shifter'],
         ['ARM', 'barrel', 'shifter']],
         [['IP power', 'IP', 'power']],
         ]

Martijn Pieters · Accepted Answer

You are grouping based on sets, so use a set to detect new groups:

def grouper(sequence):
    group, members = [], set()

    for item in sequence:
        if group and members.isdisjoint(item):
            # new group, yield and start new
            yield group
            group, members = [], set()
        group.append(item)
        members.update(item)

    yield group

This gives:

>>> for group in grouper(paths):
...     print group
... 
[['D', 'B', 'A', 'H'], ['D', 'B', 'A', 'C'], ['H', 'A', 'C']]
[['E', 'G', 'I'], ['F', 'G', 'I']]

or you could cast it to a list again:

output = list(grouper(paths))

This assumes that the groups are contiguous. If you have disjoint groups, you need to process the whole list and loop over all groups constructed so far for each item:

def grouper(sequence):
    result = []  # will hold (members, group) tuples

    for item in sequence:
        for members, group in result:
            if members.intersection(item):  # overlap
                members.update(item)
                group.append(item)
                break
        else:  # no group found, add new
            result.append((set(item), [item]))

    return [group for members, group in result]

abarnert · Answer

As with most questions in Python that start with "I'm trying to group… by foo…", the answer is "Use itertools.groupby with foo as a key."

First, let's take a very simple grouping criterion: length of each list. For that, the key function is just len. (You may also want to sort first, probably with the same key, depending on your data.)

groups = [list(group) for key, group in itertools.groupby(paths, len)]

Sometimes defining the grouping criterion (and therefore key function) is hard, or impossible, to define in terms of an independent transformation on each element. In those cases, groupby is generally not the answer (although groupby plus another itertools function still might be).

In this case, the most natural way to define the grouping criterion is by comparison with adjacent elements. The easiest way to write that is to write a cmp function that compares two adjacent elements, and then use functools.cmp_to_key on it:

def cmp_paths(lhs, rhs):
    return 0 if any(key in lhs for key in rhs) else -1
key_paths = functools.cmp_to_key(cmp_paths)
groups = [list(group) for key, group in itertools.groupby(paths, key_paths)]

Group Python list of lists into groups based on overlapping items

Tags:

python

algorithm

list

Shankar

2 Answers

Martijn Pieters

abarnert

Recent Activity

Donate For Us

Group Python list of lists into groups based on overlapping items

Tags:

python

algorithm

list

Shankar

2 Answers

Martijn Pieters

abarnert

Related questions

Recent Activity

Donate For Us