Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

compare if an element exists in two lists

Tags:

python

list

What would be the easiest and the most elegant way of checking if an element exists in two given lists. For Example, i have two lists as follows ?

>>>a, b = ['a', 'b', 'g', 'r'], ['e', 'g', 'l', 1, 'w']

Now in the above given lists i want to check if there is any element that exists in both lists. Currently i am doing it as follows

def elm_exists(list_a, list_b):
    for elm in list_a:
        if elm in list_b:
            return True
    return False

is there a more elegant way of doing this ?

like image 514
Amyth Avatar asked Jan 16 '13 06:01

Amyth


People also ask

How do you compare elements of two different lists?

counter() method can be used to compare lists efficiently. The counter() function counts the frequency of the items in a list and stores the data as a dictionary in the format <value>:<frequency>. If two lists have the exact same dictionary output, we can infer that the lists are the same.

How do you check if an element of a list is in another list?

There are 2 ways to understand check if the list contains elements of another list. First, use all() functions to check if a Python list contains all the elements of another list. And second, use any() function to check if the list contains any elements of another one.

How do you check if a common element is in two lists in Python?

Use the intersection function to check if both sets have any elements in common. If they have many elements in common, then print the intersection of both sets.

Can you use == for lists in Python?

A straightforward way to check the equality of the two lists in Python is by using the equality == operator. When the equality == is used on the list type in Python, it returns True if the lists are equal and False if they are not.


3 Answers

check elements of a and b with this:

set(a).intersection(b)

example:

In [44]: nk=set(a).intersection(b)

In [45]: for x in a:
    ...:     if x in nk:
    ...:         print x, 'present in b'
    ...:     else:
    ...:         print x, 'absent in b'
    ...:         
a absent in b
b absent in b
g present in b
r absent in b

also:

In [47]: timeit(set(a) & set(b))
1000000 loops, best of 3: 940 ns per loop

In [48]: timeit(set(a).intersection(b))
1000000 loops, best of 3: 854 ns per loop

In [49]: timeit([x for x in a if x in b])
1000000 loops, best of 3: 1 us per loop

hence use set(a).intersection(b).

like image 187
namit Avatar answered Oct 14 '22 13:10

namit


You don't need to convert both lists to sets, just one. I think skipping the unnecessary conversion makes it more readable and elegant.

So, either:

set(a).intersection(b)

Or:

s = set(a)
any(e in s for e in b)

The latter has the advantage of short-circuiting as soon as it finds one match, and better expressing the logic, and returning True or False instead of a non-falsey or falsey set, but it is two lines instead of one, if that bothers you. I have no idea whether this advantage cancels out the cost of putting the loop inside a generator expression instead of inside a C function.

Performance results with lists this small are almost meaningless, so let's try this:

In [373]: a=[random.choice(string.ascii_lowercase) for _ in range(10000)]
In [374]: b=[random.choice(string.ascii_lowercase) for _ in range(10000)]
In [375]: %timeit(set(a))
10000 loops, best of 3: 180 us per loop
In [376]: s=set(a) # since all versions need to do this
In [391]: %timeit(s & set(b))
1000000 loops, best of 3: 178 us per loop
In [392]: %timeit(s.intersection(b))
1000000 loops, best of 3: 247 us per loop
In [393]: %timeit(discard(e in s for e in b))
1000000 loops, best of 3: 550 ns per loop
In [394]: %timeit(any(e in s for e in b))
1000000 loops, best of 3: 749 ns per loop
In [395]: %timeit(any(e in a for e in b))
1000000 loops, best of 3: 1.42 us per loop

To put the numbers all in the nanosecond scale, add back in the cost of the set(a) that all but the last need, and compare the same tests from three Python versions (Apple stock CPython 2.7.2, python.org CPython 3.3.0, Homebrew PyPy 1.9.0/2.7.2, all 64-bit Mac builds):

                  CP272  CP330  PyPy
s & set(b)        358000 316000 180500
s.intersection(b) 427000 459000 180900
discard(genexp)   180550 157341  90094
any(genexp)       180749 157334  90208
any(list-genexp)    1420    686    960

Now that I think about it, this makes total sense. The odds of getting a collision pretty early on are very high, so the cost of converting the whole thing to a set dominates everything.

Which means we need a new test, with 10000 unique values. Let's repeat the test, with this:

In [29]: a, b = list(range(10000)), list(range(10000))
In [30]: random.shuffle(a)
In [31]: random.shuffle(b)

                  CP272  CP330  PyPy
s & set(b)        1277000 1168000 1141000
s.intersection(b) 1165000 1117000 2520000
discard(genexp)   1699000 1271000  770000
any(genexp)        389800  344543  320807
any(list-genexp)    62000   10400    1520

These are more reasonable. And they still make sense. If you're comparing the same 10000 elements randomly shuffled, how far into each do you have to go? Not far enough to make the cost of set-ifying either of the lists worth doing, much less both of them!

So, let's try a case where there are no matches:

In [43]: a=list(range(10000, 20000))


                  CP272     CP330     PyPy
s & set(b)           751000    770000  733000
s.intersection(b)    466000    530000 1920000
discard(genexp)     1246000    985000  749000
any(genexp)         1269000    966000  893000
any(list-genexp)  185000000 176000000 5870000

I have no idea how PyPy did the last one so fast, but other than that, no surprises here.

So, which one is best?

Clearly, if you expect lots of collisions, you want to avoid making the sets whenever possible—but if you expect few collisions, you want to make at least one set. If you have no idea, I think the safest bet is any(genexp)—at worst it's less than 3x as bad as the best, and if there's a chance the collision rate is high, it'll be a lot faster. But you can look at the numbers and see for yourself.

Or, better, of course, time them all against real test data you expect to encounter.

like image 36
abarnert Avatar answered Oct 14 '22 15:10

abarnert


This works:

>>> l1,l2=['a', 'b', 'g', 'r'], ['e', 'g', 'l', 1, 'w']
>>> list(set(l1)&set(l2))
['g']
like image 25
the wolf Avatar answered Oct 14 '22 15:10

the wolf