I have a list of lists and I am trying to group or cluster them based on their items. A nested list starts a new group if none of the elements are in the previous group.
Input:
paths = [
['D', 'B', 'A', 'H'],
['D', 'B', 'A', 'C'],
['H', 'A', 'C'],
['E', 'G', 'I'],
['F', 'G', 'I']]
My failed Code:
paths = [
['D', 'B', 'A', 'H'],
['D', 'B', 'A', 'C'],
['H', 'A', 'C'],
['E', 'G', 'I'],
['F', 'G', 'I']
]
groups = []
paths_clone = paths
for path in paths:
for node in path:
for path_clone in paths_clone:
if node in path_clone:
if not path == path_clone:
groups.append([path, path_clone])
else:
groups.append(path)
print groups
Expected Output:
[
[
['D', 'B', 'A', 'H'],
['D', 'B', 'A', 'C'],
['H', 'A', 'C']
],
[
['E', 'G', 'I'],
['F', 'G', 'I']
]
]
Another Example:
paths = [['shifter', 'barrel', 'barrel shifter'],
['ARM', 'barrel', 'barrel shifter'],
['IP power', 'IP', 'power'],
['ARM', 'barrel', 'shifter']]
Expected Output Groups:
output = [
[['shifter', 'barrel', 'barrel shifter'],
['ARM', 'barrel', 'barrel shifter'],
['ARM', 'barrel', 'shifter']],
[['IP power', 'IP', 'power']],
]
You are grouping based on sets, so use a set to detect new groups:
def grouper(sequence):
group, members = [], set()
for item in sequence:
if group and members.isdisjoint(item):
# new group, yield and start new
yield group
group, members = [], set()
group.append(item)
members.update(item)
yield group
This gives:
>>> for group in grouper(paths):
... print group
...
[['D', 'B', 'A', 'H'], ['D', 'B', 'A', 'C'], ['H', 'A', 'C']]
[['E', 'G', 'I'], ['F', 'G', 'I']]
or you could cast it to a list again:
output = list(grouper(paths))
This assumes that the groups are contiguous. If you have disjoint groups, you need to process the whole list and loop over all groups constructed so far for each item:
def grouper(sequence):
result = [] # will hold (members, group) tuples
for item in sequence:
for members, group in result:
if members.intersection(item): # overlap
members.update(item)
group.append(item)
break
else: # no group found, add new
result.append((set(item), [item]))
return [group for members, group in result]
As with most questions in Python that start with "I'm trying to group… by foo…", the answer is "Use itertools.groupby
with foo as a key."
First, let's take a very simple grouping criterion: length of each list. For that, the key function is just len
. (You may also want to sort
first, probably with the same key, depending on your data.)
groups = [list(group) for key, group in itertools.groupby(paths, len)]
Sometimes defining the grouping criterion (and therefore key function) is hard, or impossible, to define in terms of an independent transformation on each element. In those cases, groupby
is generally not the answer (although groupby
plus another itertools
function still might be).
In this case, the most natural way to define the grouping criterion is by comparison with adjacent elements. The easiest way to write that is to write a cmp
function that compares two adjacent elements, and then use functools.cmp_to_key
on it:
def cmp_paths(lhs, rhs):
return 0 if any(key in lhs for key in rhs) else -1
key_paths = functools.cmp_to_key(cmp_paths)
groups = [list(group) for key, group in itertools.groupby(paths, key_paths)]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With