Say I have a list of lists like so:
l = [[1,2,3],[6,5,4,3,7,2],[4,3,2,9],[6,7],[5,1,0],[6,3,2,7]]
How would I write python code to check if there are elements that always occur together? For example, in the above example 2,3 and 6,7 always occur in the same lists. (there may be others, not sure).
What's the easiest to understand way of achieving this?
My only idea is to convert inner-list1
to set and check intersection with inner-list2
but when I check intersection with inner-list3
, those elements may not occur at all in inner-list3
.
Could I do something like:
for i in range(0,len(lists)):
a=set(lists[i]).intersection(lists[i+1])
if (len(a))==0:
continue
else:
a.intersection(lists[i+1])
This of course doesn't work but how could I formally code this or is there a better approach to this?
First, the data
data = [[1,2,3],[6,5,4,3,7,2],[4,3,2,9],[6,7],[5,1,0],[6,3,2,7]]
Generating combinations is expensive, so I wanted to avoid that as much as possible.
My "Eureka!" moment came when I realized I don't have to generate all the pairs. Instead, I can map each number to all the lists that contain it.
appears_in = defaultdict(set)
for g in groups:
for number in g:
appears_in[number].add(tuple(g))
The resulting dictionary is
{0: {(5, 1, 0)},
1: {(5, 1, 0), (1, 2, 3)},
2: {(4, 3, 2, 9), (6, 3, 2, 7), (6, 5, 4, 3, 7, 2), (1, 2, 3)},
3: {(4, 3, 2, 9), (6, 3, 2, 7), (6, 5, 4, 3, 7, 2), (1, 2, 3)},
4: {(4, 3, 2, 9), (6, 5, 4, 3, 7, 2)},
5: {(5, 1, 0), (6, 5, 4, 3, 7, 2)},
6: {(6, 3, 2, 7), (6, 7), (6, 5, 4, 3, 7, 2)},
7: {(6, 3, 2, 7), (6, 7), (6, 5, 4, 3, 7, 2)},
9: {(4, 3, 2, 9)}}
Look at the entries for 2 and 3
2: {(4, 3, 2, 9), (6, 3, 2, 7), (6, 5, 4, 3, 7, 2), (1, 2, 3)},
3: {(4, 3, 2, 9), (6, 3, 2, 7), (6, 5, 4, 3, 7, 2), (1, 2, 3)},
The set of lists containing 2 is identical to the set of lists containing 3. So I conclude that 2 and 3 always appear together.
Contrast this with 3 and 4
3: {(4, 3, 2, 9), (6, 3, 2, 7), (6, 5, 4, 3, 7, 2), (1, 2, 3)},
4: {(4, 3, 2, 9), (6, 5, 4, 3, 7, 2)},
Notice the gaps where (6, 3, 2, 7)
and (1, 2, 3)
should be. I conclude that 3 and 4 do NOT always appear together.
Here is the complete code
from collections import defaultdict
from itertools import combinations
from pprint import pprint
def always_appear_together(groups):
appears_in = defaultdict(set)
for g in groups:
for number in g:
appears_in[number].add(tuple(g))
#pprint(appears_in) # for debugging
return [
(i,j)
for (i,val_i),(j,val_j) in combinations(appears_in.items(),2)
if val_i == val_j
]
Running this gives
print(always_appear_together(data))
[(2, 3), (6, 7)]
You can do this using set intersections, and it also works nicely for 3 or more elements per group: Note that I added an 8
to the 6,7
group.
lists = [[1,2,3], [6,5,4,3,7,2,8], [4,3,2,9], [8,6,7], [5,1,0], [6,3,8,2,7]]
First, we map each element to sets of all the other elements it appears together with:
groups = {}
for lst in lists:
for x in lst:
if x not in groups:
groups[x] = set(lst)
else:
groups[x].intersection_update(lst)
# {0: {0, 1, 5}, 1: {1}, 2: {2, 3}, 3: {2, 3}, 4: {2, 3, 4}, 5: {5},
# 6: {8, 6, 7}, 7: {8, 6, 7}, 8: {8, 6, 7}, 9: {9, 2, 3, 4}}
Next, we retain only those elements where the relationship is bidirectional:
groups2 = {k: {v for v in groups[k] if k in groups[v]} for k in groups}
# {0: {0}, 1: {1}, 2: {2, 3}, 3: {2, 3}, 4: {4}, 5: {5},
# 6: {8, 6, 7}, 7: {8, 6, 7}, 8: {8, 6, 7}, 9: {9}}
Finally, we get the unique groups with more than one element:
groups3 = {frozenset(v) for v in groups2.values() if len(v) > 1}
# {frozenset({8, 6, 7}), frozenset({2, 3})}
itertools.combinations
:I initially thought of using something with itertools.combination
, but as this allows elements
from a list
which are not next to each other, it wasn't going to work for the solution I had in mind.
Turns out that when looking at non-numerical input lists
, itertools.combinations
is necessary in both cases. I had been confused because I assumed the groups
had to be adjacent
.
The way I thought would work best for this would be to generate the possible elements
that could work and then check each one of these with a function
against the list
of sub-lists
- as opposed to doing some kind of combinatoric work on the list
and going down that path.
So to check if a list
of possible elements
is 'valid' i.e. if all the elements
only occur together, I used a simple if
with a generator with the all()
and any()
built-in functions
to do this part of the job.
Now this was working, there needed to be a way of generating the potential elements
that could occur. I just did this with two nested for-loops
- one iterating
over the width
of the window
, and one iterating
over where the start
of the window
is.
Then from here, we just check if that set of elements
is valid
and add it to another list
if it is!
import itertools
def valid(p):
for s in l:
if any(e in s for e in p) and not all(e in s for e in p):
return False
return True
l = [[1,2,3],[6,5,4,3,7,2],[4,3,2,9],[6,7],[5,1,0],[6,3,2,7]]
els = list(set(b for a in l for b in a))
sol = []
for w in range(2,len(els)+1):
for c in itertools.combinations(els, w):
if valid(c):
sol.append(c)
which gives sol
as:
[(2, 3), (6, 7)]]
These 2
nested for-loops
can actually be thrown together into a nice one-liner
(not sure if others think it is Pythonic):
sol = [c for w in range(2, len(els)+1) for c in itertools.combinations(els, w) if valid(c)]
which works just the same but is simply shorter.
Due to popular demand (@Arman
), I have updated the answer so that it should now work for other elements
apart from 0-9
. This was done with the introduction of a unique elements
list
(els
).
And some tests from @thanasisp
with the same code from above:
l = [[1, 3, 5, 7],[1, 3, 5, 7]]
gives sol
as:
[(1, 3), (1, 5), (1, 7), (3, 5), (3, 7), (5, 7), (1, 3, 5), (1, 3, 7), (1, 5, 7), (3, 5, 7), (1, 3, 5, 7)]
and again with:
l = [[1, 2, 3, 5, 7], [1, 3, 5, 7]]
gives:
[(1, 3), (1, 5), (1, 7), (3, 5), (3, 7), (5, 7), (1, 3, 5), (1, 3, 7), (1, 5, 7), (3, 5, 7), (1, 3, 5, 7)]
which I believe is correct as the 2
shouldn't be in any groups as all other elements
are in a different sub-list
, so it can never make a group with another element
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With