Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to efficiently group pairs based on shared item?

I have a list of pairs (tuples), for simplification something like this:

L = [("A","B"), ("B","C"), ("C","D"), ("E","F"), ("G","H"), ("H","I"), ("G","I"), ("G","J")]

Using python I want efficiently split this list to:

L1 = [("A","B"), ("B","C"), ("C","D")]
L2 = [("E","F")]
L3 = [("G","H"), ("G","I"), ("G","J"), ("H","I")]

How to efficiently split list into groups of pairs, where for pairs in the group there must be always at least one pair which shares one item with others? As stated in one of the answers this is actually network problem. The goal is to efficiently split network into disconnected (isolated) network parts.

Type lists, tuples (sets) may be changed for achieving higher efficiency.

like image 435
Miro Avatar asked Mar 19 '19 01:03

Miro


4 Answers

This is more like a network problem, so we can use networkx:

import networkx as nx
G=nx.from_edgelist(L)

l=list(nx.connected_components(G))
# after that we create the map dict , for get the unique id for each nodes
mapdict={z:x for x, y in enumerate(l) for z in y }
# then append the id back to original data for groupby 
newlist=[ x+(mapdict[x[0]],)for  x in L]
import itertools
#using groupby make the same id into one sublist
newlist=sorted(newlist,key=lambda x : x[2])
yourlist=[list(y) for x , y in itertools.groupby(newlist,key=lambda x : x[2])]
yourlist
[[('A', 'B', 0), ('B', 'C', 0), ('C', 'D', 0)], [('E', 'F', 1)], [('G', 'H', 2), ('H', 'I', 2), ('G', 'I', 2), ('G', 'J', 2)]]

Then to match your output format:

L1,L2,L3=[[y[:2]for y in x] for x in yourlist]
L1
[('A', 'B'), ('B', 'C'), ('C', 'D')]
L2
[('E', 'F')]
L3
[('G', 'H'), ('H', 'I'), ('G', 'I'), ('G', 'J')]
like image 175
BENY Avatar answered Oct 20 '22 07:10

BENY


  • Initialise a list of groups as empty
  • Let (a, b) be the next pair
  • Collect all groups that contain any elements with a or b
  • Remove them all, join them, add (a, b), and insert as a new group
  • Repeat till done

That'd be something like this:

import itertools, functools

def partition(pred, iterable):
    t1, t2 = itertools.tee(iterable)
    return itertools.filterfalse(pred, t1), filter(pred, t2)

groups = []
for a, b in L:
    unrelated, related = partition(lambda group: any(aa == a or bb == b or aa == b or bb == a for aa, bb in group), groups)
    groups = [*unrelated, sum(related, [(a, b)])]
like image 2
Amadan Avatar answered Oct 20 '22 07:10

Amadan


You can use the following code:

l = [("A","B"), ("B","C"), ("C","D"), ("E","F"), ("G","H"), ("H","I"), ("G","I"), ("G","J")]

result = []
if len(l) > 1:
  tmp = [l[0]]
  for i in range(1,len(l)):
    if l[i][0] == l[i-1][1] or l[i][1] == l[i-1][0] or l[i][1] == l[i-1][1] or l[i][0] == l[i-1][0]:
      tmp.append(l[i])
    else:
      result.append(tmp)
      tmp = [l[i]]
  result.append(tmp)
else:
  result = l

for elem in result:
  print(elem)

output:

[('A', 'B'), ('B', 'C'), ('C', 'D')]
[('E', 'F')]
[('G', 'H'), ('H', 'I'), ('G', 'I'), ('G', 'J')]

Note: this code is based on the hypothesis that your initial array is sorted. If this is not the case it will not work as it does only one pass on the whole list to create the groups (complexity O(n)).

Explanations:

  • result will store your groups
  • if len(l) > 1: if you have only one element in your list or an empty list no need to do any processing you have the answer
  • You will to a one pass on each element of the list and compare the 4 possible equality between the tuple at position i and the one at position i-1.
  • tmp is used to construct your groups, as long as the condition is met you add tuples to tmp
  • when the condition is not respected you add tmp (the current group that has been created to the result, reinitiate tmp with the current tuple) and you continue.
like image 1
Allan Avatar answered Oct 20 '22 07:10

Allan


An efficient and Pythonic approach is to convert the list of tuples to a set of frozensets as a pool of candidates, and in a while loop, create a set as group and use a nested while loop to keep expanding the group by adding the first candidate set and then performing set union with other candidate sets that intersects with the group until there is no more intersecting candidate, at which point go back to the outer loop to form a new group:

pool = set(map(frozenset, L))
groups = []
while pool:
    group = set()
    groups.append([])
    while True:
        for candidate in pool:
            if not group or group & candidate:
                group |= candidate
                groups[-1].append(tuple(candidate))
                pool.remove(candidate)
                break
        else:
            break

Given your sample input, groups will become:

[[('A', 'B'), ('C', 'B'), ('C', 'D')],
 [('G', 'H'), ('H', 'I'), ('G', 'J'), ('G', 'I')],
 [('E', 'F')]]

Keep in mind that sets are unordered in Python, which is why the order of the above output doesn't match your expected output, but for your purpose the order should not matter.

like image 1
blhsing Avatar answered Oct 20 '22 09:10

blhsing