I have two lists of lists:
a = [[0, 1, 5], [2], [3], [4], [6, 7], [8, 9, 10, 11], [12], [13], [14], [15]]
b = [[0, 1], [2, 3], [4], [5], [6, 7], [8, 9, 10, 11], [12], [13, 14], [15]]
How can I find the maximum overlap between the values of the lists and build a new list of lists with this maximum overlap.
In other words, I'm looking for a function f
which maximizes the list sizes by merging lists with overlap.
The desired result of function f
for this example would be:
f(a,b) = [[0, 1, 5], [2, 3], [4], [6, 7], [8, 9, 10, 11], [12], [13, 14], [15]]
You can use a variant of the disjoint-set structure to solve this problem: for each list [a,b,c]
you unify a
with b
and a
with c
. You do this for both lists and then derive the resulting roots.
Here there is a simply disjunct-set algorithm we can modify:
from collections import defaultdict
def parent(u,mapping):
if mapping[u] == u:
return u
mapping[u] = parent(mapping[u],mapping)
return mapping[u]
def relation(array,mapping=None):
if mapping is None:
mapping = {}
for e in array:
if len(e) > 0:
u = e[0]
if u not in mapping:
mapping[u] = u
for v in e[1:]:
if v not in mapping:
mapping[v] = v
mapping[parent(u,mapping)] = parent(v,mapping)
return mapping
def f(a,b):
mapping = {}
relation(a,mapping)
relation(b,mapping)
results = defaultdict(set)
for u in mapping.keys():
results[parent(u,mapping)].add(u)
return [list(x) for x in results.values()]
(boldface added for the semantical differences with the original union-set algorithm).
This produces:
>>> f(a,b)
[[2, 3], [4], [0, 1, 5], [6, 7], [8, 9, 10, 11], [12], [13, 14], [15]]
The result is not sorted, since we work with a set. Nevertheless, you can easily sort it on the first element of each tuple if you want by altering f
to:
def f(a,b):
mapping = {}
relation(a,mapping)
relation(b,mapping)
results = defaultdict(set)
for u in mapping.keys():
results[parent(u,mapping)].add(u)
return sorted([list(x) for x in results.values()],key=lambda t:t[0])
which produces:
>>> f(a,b)
[[0, 1, 5], [2, 3], [4], [6, 7], [8, 9, 10, 11], [12], [13, 14], [15]]
The nice thing with this solution is that it also works if there is overlap in a
or b
itself, and you can easily generalize the solution to work with an arbitrary amount of lists (for instance a
, b
and c
).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With