Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Check if elements occur together in all lists?

Say I have a list of lists like so:

l = [[1,2,3],[6,5,4,3,7,2],[4,3,2,9],[6,7],[5,1,0],[6,3,2,7]]

How would I write python code to check if there are elements that always occur together? For example, in the above example 2,3 and 6,7 always occur in the same lists. (there may be others, not sure).

What's the easiest to understand way of achieving this?

My only idea is to convert inner-list1 to set and check intersection with inner-list2 but when I check intersection with inner-list3, those elements may not occur at all in inner-list3.

Could I do something like:

for i in range(0,len(lists)):    
    a=set(lists[i]).intersection(lists[i+1])
    if (len(a))==0:
        continue
    else:
        a.intersection(lists[i+1])

This of course doesn't work but how could I formally code this or is there a better approach to this?

like image 825
doddy Avatar asked Nov 11 '17 17:11

doddy


3 Answers

First, the data

data = [[1,2,3],[6,5,4,3,7,2],[4,3,2,9],[6,7],[5,1,0],[6,3,2,7]]

Generating combinations is expensive, so I wanted to avoid that as much as possible.

My "Eureka!" moment came when I realized I don't have to generate all the pairs. Instead, I can map each number to all the lists that contain it.

appears_in = defaultdict(set)
for g in groups:
    for number in g:
        appears_in[number].add(tuple(g))

The resulting dictionary is

{0: {(5, 1, 0)},
 1: {(5, 1, 0), (1, 2, 3)},
 2: {(4, 3, 2, 9), (6, 3, 2, 7), (6, 5, 4, 3, 7, 2), (1, 2, 3)},
 3: {(4, 3, 2, 9), (6, 3, 2, 7), (6, 5, 4, 3, 7, 2), (1, 2, 3)},
 4: {(4, 3, 2, 9), (6, 5, 4, 3, 7, 2)},
 5: {(5, 1, 0), (6, 5, 4, 3, 7, 2)},
 6: {(6, 3, 2, 7), (6, 7), (6, 5, 4, 3, 7, 2)},
 7: {(6, 3, 2, 7), (6, 7), (6, 5, 4, 3, 7, 2)},
 9: {(4, 3, 2, 9)}}

Look at the entries for 2 and 3

2: {(4, 3, 2, 9), (6, 3, 2, 7), (6, 5, 4, 3, 7, 2), (1, 2, 3)},
3: {(4, 3, 2, 9), (6, 3, 2, 7), (6, 5, 4, 3, 7, 2), (1, 2, 3)},

The set of lists containing 2 is identical to the set of lists containing 3. So I conclude that 2 and 3 always appear together.

Contrast this with 3 and 4

 3: {(4, 3, 2, 9), (6, 3, 2, 7), (6, 5, 4, 3, 7, 2), (1, 2, 3)},
 4: {(4, 3, 2, 9),               (6, 5, 4, 3, 7, 2)},

Notice the gaps where (6, 3, 2, 7) and (1, 2, 3) should be. I conclude that 3 and 4 do NOT always appear together.

Here is the complete code

from collections import defaultdict
from itertools import combinations
from pprint import pprint

def always_appear_together(groups):
    appears_in = defaultdict(set)
    for g in groups:
        for number in g:
            appears_in[number].add(tuple(g))
    #pprint(appears_in)    # for debugging                                                                                                                        
    return [
        (i,j) 
        for (i,val_i),(j,val_j) in combinations(appears_in.items(),2) 
        if val_i == val_j
    ]

Running this gives

print(always_appear_together(data))
[(2, 3), (6, 7)]
like image 67
Marc Poulin Avatar answered Nov 15 '22 13:11

Marc Poulin


You can do this using set intersections, and it also works nicely for 3 or more elements per group: Note that I added an 8 to the 6,7 group.

lists = [[1,2,3], [6,5,4,3,7,2,8], [4,3,2,9], [8,6,7], [5,1,0], [6,3,8,2,7]]

First, we map each element to sets of all the other elements it appears together with:

groups = {}
for lst in lists:
    for x in lst:
        if x not in groups:
            groups[x] = set(lst)
        else:
            groups[x].intersection_update(lst)
# {0: {0, 1, 5}, 1: {1}, 2: {2, 3}, 3: {2, 3}, 4: {2, 3, 4}, 5: {5}, 
#  6: {8, 6, 7}, 7: {8, 6, 7}, 8: {8, 6, 7}, 9: {9, 2, 3, 4}}

Next, we retain only those elements where the relationship is bidirectional:

groups2 = {k: {v for v in groups[k] if k in groups[v]} for k in groups}
# {0: {0}, 1: {1}, 2: {2, 3}, 3: {2, 3}, 4: {4}, 5: {5}, 
#  6: {8, 6, 7}, 7: {8, 6, 7}, 8: {8, 6, 7}, 9: {9}}

Finally, we get the unique groups with more than one element:

groups3 = {frozenset(v) for v in groups2.values() if len(v) > 1}
# {frozenset({8, 6, 7}), frozenset({2, 3})}
like image 28
tobias_k Avatar answered Nov 15 '22 13:11

tobias_k


Using itertools.combinations:

I initially thought of using something with itertools.combination, but as this allows elements from a list which are not next to each other, it wasn't going to work for the solution I had in mind.

Turns out that when looking at non-numerical input lists, itertools.combinations is necessary in both cases. I had been confused because I assumed the groups had to be adjacent.

The way I thought would work best for this would be to generate the possible elements that could work and then check each one of these with a function against the list of sub-lists - as opposed to doing some kind of combinatoric work on the list and going down that path.

So to check if a list of possible elements is 'valid' i.e. if all the elements only occur together, I used a simple if with a generator with the all() and any() built-in functions to do this part of the job.

Now this was working, there needed to be a way of generating the potential elements that could occur. I just did this with two nested for-loops - one iterating over the width of the window, and one iterating over where the start of the window is.

Then from here, we just check if that set of elements is valid and add it to another list if it is!


import itertools

def valid(p):
    for s in l:
        if any(e in s for e in p) and not all(e in s for e in p):
            return False
    return True

l = [[1,2,3],[6,5,4,3,7,2],[4,3,2,9],[6,7],[5,1,0],[6,3,2,7]]
els = list(set(b for a in l for b in a))
sol = []
for w in range(2,len(els)+1):
    for c in itertools.combinations(els, w):
        if valid(c):
            sol.append(c)

which gives sol as:

[(2, 3), (6, 7)]]

These 2 nested for-loops can actually be thrown together into a nice one-liner (not sure if others think it is Pythonic):

sol = [c for w in range(2, len(els)+1) for c in itertools.combinations(els, w) if valid(c)]

which works just the same but is simply shorter.


Due to popular demand (@Arman), I have updated the answer so that it should now work for other elements apart from 0-9. This was done with the introduction of a unique elements list (els).


And some tests from @thanasisp with the same code from above:

l = [[1, 3, 5, 7],[1, 3, 5, 7]]

gives sol as:

[(1, 3), (1, 5), (1, 7), (3, 5), (3, 7), (5, 7), (1, 3, 5), (1, 3, 7), (1, 5, 7), (3, 5, 7), (1, 3, 5, 7)]

and again with:

 l = [[1, 2, 3, 5, 7], [1, 3, 5, 7]]

gives:

 [(1, 3), (1, 5), (1, 7), (3, 5), (3, 7), (5, 7), (1, 3, 5), (1, 3, 7), (1, 5, 7), (3, 5, 7), (1, 3, 5, 7)]

which I believe is correct as the 2 shouldn't be in any groups as all other elements are in a different sub-list, so it can never make a group with another element.

like image 29
Joe Iddon Avatar answered Nov 15 '22 13:11

Joe Iddon