compare if an element exists in two lists

Tags:

list

What would be the easiest and the most elegant way of checking if an element exists in two given lists. For Example, i have two lists as follows ?

>>>a, b = ['a', 'b', 'g', 'r'], ['e', 'g', 'l', 1, 'w']

Now in the above given lists i want to check if there is any element that exists in both lists. Currently i am doing it as follows

def elm_exists(list_a, list_b):
    for elm in list_a:
        if elm in list_b:
            return True
    return False

is there a more elegant way of doing this ?

514

asked Jan 16 '13 06:01

3 Answers

check elements of a and b with this:

set(a).intersection(b)

example:

In [44]: nk=set(a).intersection(b)

In [45]: for x in a:
    ...:     if x in nk:
    ...:         print x, 'present in b'
    ...:     else:
    ...:         print x, 'absent in b'
    ...:         
a absent in b
b absent in b
g present in b
r absent in b

also:

In [47]: timeit(set(a) & set(b))
1000000 loops, best of 3: 940 ns per loop

In [48]: timeit(set(a).intersection(b))
1000000 loops, best of 3: 854 ns per loop

In [49]: timeit([x for x in a if x in b])
1000000 loops, best of 3: 1 us per loop

hence use set(a).intersection(b).

187

answered Oct 14 '22 13:10

namit

You don't need to convert both lists to sets, just one. I think skipping the unnecessary conversion makes it more readable and elegant.

So, either:

set(a).intersection(b)

Or:

s = set(a)
any(e in s for e in b)

The latter has the advantage of short-circuiting as soon as it finds one match, and better expressing the logic, and returning True or False instead of a non-falsey or falsey set, but it is two lines instead of one, if that bothers you. I have no idea whether this advantage cancels out the cost of putting the loop inside a generator expression instead of inside a C function.

Performance results with lists this small are almost meaningless, so let's try this:

In [373]: a=[random.choice(string.ascii_lowercase) for _ in range(10000)]
In [374]: b=[random.choice(string.ascii_lowercase) for _ in range(10000)]
In [375]: %timeit(set(a))
10000 loops, best of 3: 180 us per loop
In [376]: s=set(a) # since all versions need to do this
In [391]: %timeit(s & set(b))
1000000 loops, best of 3: 178 us per loop
In [392]: %timeit(s.intersection(b))
1000000 loops, best of 3: 247 us per loop
In [393]: %timeit(discard(e in s for e in b))
1000000 loops, best of 3: 550 ns per loop
In [394]: %timeit(any(e in s for e in b))
1000000 loops, best of 3: 749 ns per loop
In [395]: %timeit(any(e in a for e in b))
1000000 loops, best of 3: 1.42 us per loop

To put the numbers all in the nanosecond scale, add back in the cost of the set(a) that all but the last need, and compare the same tests from three Python versions (Apple stock CPython 2.7.2, python.org CPython 3.3.0, Homebrew PyPy 1.9.0/2.7.2, all 64-bit Mac builds):

                  CP272  CP330  PyPy
s & set(b)        358000 316000 180500
s.intersection(b) 427000 459000 180900
discard(genexp)   180550 157341  90094
any(genexp)       180749 157334  90208
any(list-genexp)    1420    686    960

Now that I think about it, this makes total sense. The odds of getting a collision pretty early on are very high, so the cost of converting the whole thing to a set dominates everything.

Which means we need a new test, with 10000 unique values. Let's repeat the test, with this:

In [29]: a, b = list(range(10000)), list(range(10000))
In [30]: random.shuffle(a)
In [31]: random.shuffle(b)

                  CP272  CP330  PyPy
s & set(b)        1277000 1168000 1141000
s.intersection(b) 1165000 1117000 2520000
discard(genexp)   1699000 1271000  770000
any(genexp)        389800  344543  320807
any(list-genexp)    62000   10400    1520

These are more reasonable. And they still make sense. If you're comparing the same 10000 elements randomly shuffled, how far into each do you have to go? Not far enough to make the cost of set-ifying either of the lists worth doing, much less both of them!

So, let's try a case where there are no matches:

In [43]: a=list(range(10000, 20000))


                  CP272     CP330     PyPy
s & set(b)           751000    770000  733000
s.intersection(b)    466000    530000 1920000
discard(genexp)     1246000    985000  749000
any(genexp)         1269000    966000  893000
any(list-genexp)  185000000 176000000 5870000

I have no idea how PyPy did the last one so fast, but other than that, no surprises here.

So, which one is best?

Clearly, if you expect lots of collisions, you want to avoid making the sets whenever possible—but if you expect few collisions, you want to make at least one set. If you have no idea, I think the safest bet is any(genexp)—at worst it's less than 3x as bad as the best, and if there's a chance the collision rate is high, it'll be a lot faster. But you can look at the numbers and see for yourself.

Or, better, of course, time them all against real test data you expect to encounter.

answered Oct 14 '22 15:10

abarnert

This works:

>>> l1,l2=['a', 'b', 'g', 'r'], ['e', 'g', 'l', 1, 'w']
>>> list(set(l1)&set(l2))
['g']

answered Oct 14 '22 15:10

the wolf

Related questions
                            
                                Mean Median Mode lines showing only in last graph in seaborn
                            
                                Django custom authentication back-end doesn't work
                            
                                I am getting an error : rest_framework.request.WrappedAttributeError: 'CSRFCheck' object has no attribute 'process_request'
                            
                                Get part of day (morning, afternoon, evening, night) in Python dataframe
                            
                                What is the difference between .one() and .scalar()
                            
                                Why can't I import candlestick_ohlc from mplfinance
                            
                                Cannot read pickle file in Cloud Run App. TypeError: __cinit__() takes at least 2 positional arguments
                            
                                How do I write a float list of lists to file in Python
                            
                                Django-admin : How to display link to object info page instead of edit form , in records change list?
                            
                                Python: how can I handle any unhandled exception in an alternative way?
                            
                                List filtering and transformation
                            
                                how to find whether a string is contained in another string
                            
                                How can I maximize a specific window with Python?
                            
                                Download files from a list if not already downloaded
                            
                                Condensed matrix function to find pairs
                            
                                python head, tail and backward read by lines of a text file
                            
                                Very simple concurrent programming in Python
                            
                                simple python oAuth 1.0 example with consumer key and secret
                            
                                Python nested loop with generators does not work (in some cases)?
                            
                                Django runserver in command with custom settings

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

compare if an element exists in two lists

Tags:

python

list

Amyth

People also ask

3 Answers

namit

abarnert

the wolf

Recent Activity

Donate For Us