How to efficiently group pairs based on shared item?

Question

I have a list of pairs (tuples), for simplification something like this:

L = [("A","B"), ("B","C"), ("C","D"), ("E","F"), ("G","H"), ("H","I"), ("G","I"), ("G","J")]

Using python I want efficiently split this list to:

L1 = [("A","B"), ("B","C"), ("C","D")]
L2 = [("E","F")]
L3 = [("G","H"), ("G","I"), ("G","J"), ("H","I")]

How to efficiently split list into groups of pairs, where for pairs in the group there must be always at least one pair which shares one item with others? As stated in one of the answers this is actually network problem. The goal is to efficiently split network into disconnected (isolated) network parts.

Type lists, tuples (sets) may be changed for achieving higher efficiency.

BENY · Accepted Answer

This is more like a network problem, so we can use networkx:

import networkx as nx
G=nx.from_edgelist(L)

l=list(nx.connected_components(G))
# after that we create the map dict , for get the unique id for each nodes
mapdict={z:x for x, y in enumerate(l) for z in y }
# then append the id back to original data for groupby 
newlist=[ x+(mapdict[x[0]],)for  x in L]
import itertools
#using groupby make the same id into one sublist
newlist=sorted(newlist,key=lambda x : x[2])
yourlist=[list(y) for x , y in itertools.groupby(newlist,key=lambda x : x[2])]
yourlist
[[('A', 'B', 0), ('B', 'C', 0), ('C', 'D', 0)], [('E', 'F', 1)], [('G', 'H', 2), ('H', 'I', 2), ('G', 'I', 2), ('G', 'J', 2)]]

Then to match your output format:

L1,L2,L3=[[y[:2]for y in x] for x in yourlist]
L1
[('A', 'B'), ('B', 'C'), ('C', 'D')]
L2
[('E', 'F')]
L3
[('G', 'H'), ('H', 'I'), ('G', 'I'), ('G', 'J')]

Amadan · Answer

Initialise a list of groups as empty
Let (a, b) be the next pair
Collect all groups that contain any elements with a or b
Remove them all, join them, add (a, b), and insert as a new group
Repeat till done

That'd be something like this:

import itertools, functools

def partition(pred, iterable):
    t1, t2 = itertools.tee(iterable)
    return itertools.filterfalse(pred, t1), filter(pred, t2)

groups = []
for a, b in L:
    unrelated, related = partition(lambda group: any(aa == a or bb == b or aa == b or bb == a for aa, bb in group), groups)
    groups = [*unrelated, sum(related, [(a, b)])]

Allan · Answer

You can use the following code:

l = [("A","B"), ("B","C"), ("C","D"), ("E","F"), ("G","H"), ("H","I"), ("G","I"), ("G","J")]

result = []
if len(l) > 1:
  tmp = [l[0]]
  for i in range(1,len(l)):
    if l[i][0] == l[i-1][1] or l[i][1] == l[i-1][0] or l[i][1] == l[i-1][1] or l[i][0] == l[i-1][0]:
      tmp.append(l[i])
    else:
      result.append(tmp)
      tmp = [l[i]]
  result.append(tmp)
else:
  result = l

for elem in result:
  print(elem)

output:

[('A', 'B'), ('B', 'C'), ('C', 'D')]
[('E', 'F')]
[('G', 'H'), ('H', 'I'), ('G', 'I'), ('G', 'J')]

Note: this code is based on the hypothesis that your initial array is sorted. If this is not the case it will not work as it does only one pass on the whole list to create the groups (complexity O(n)).

Explanations:

result will store your groups
if len(l) > 1: if you have only one element in your list or an empty list no need to do any processing you have the answer
You will to a one pass on each element of the list and compare the 4 possible equality between the tuple at position i and the one at position i-1.
tmp is used to construct your groups, as long as the condition is met you add tuples to tmp
when the condition is not respected you add tmp (the current group that has been created to the result, reinitiate tmp with the current tuple) and you continue.

blhsing · Answer

An efficient and Pythonic approach is to convert the list of tuples to a set of frozensets as a pool of candidates, and in a while loop, create a set as group and use a nested while loop to keep expanding the group by adding the first candidate set and then performing set union with other candidate sets that intersects with the group until there is no more intersecting candidate, at which point go back to the outer loop to form a new group:

pool = set(map(frozenset, L))
groups = []
while pool:
    group = set()
    groups.append([])
    while True:
        for candidate in pool:
            if not group or group & candidate:
                group |= candidate
                groups[-1].append(tuple(candidate))
                pool.remove(candidate)
                break
        else:
            break

Given your sample input, groups will become:

[[('A', 'B'), ('C', 'B'), ('C', 'D')],
 [('G', 'H'), ('H', 'I'), ('G', 'J'), ('G', 'I')],
 [('E', 'F')]]

Keep in mind that sets are unordered in Python, which is why the order of the above output doesn't match your expected output, but for your purpose the order should not matter.

How to efficiently group pairs based on shared item?

Tags:

python

networking

grouping

Miro

4 Answers

BENY

Amadan

Allan

blhsing

Recent Activity

Donate For Us

How to efficiently group pairs based on shared item?

Tags:

python

networking

grouping

Miro

4 Answers

BENY

Amadan

Allan

blhsing

Related questions

Recent Activity

Donate For Us