What would be the easiest and the most elegant way of checking if an element exists in two given lists. For Example, i have two lists as follows ?
>>>a, b = ['a', 'b', 'g', 'r'], ['e', 'g', 'l', 1, 'w']
Now in the above given lists i want to check if there is any element that exists in both lists. Currently i am doing it as follows
def elm_exists(list_a, list_b):
for elm in list_a:
if elm in list_b:
return True
return False
is there a more elegant way of doing this ?
counter() method can be used to compare lists efficiently. The counter() function counts the frequency of the items in a list and stores the data as a dictionary in the format <value>:<frequency>. If two lists have the exact same dictionary output, we can infer that the lists are the same.
There are 2 ways to understand check if the list contains elements of another list. First, use all() functions to check if a Python list contains all the elements of another list. And second, use any() function to check if the list contains any elements of another one.
Use the intersection function to check if both sets have any elements in common. If they have many elements in common, then print the intersection of both sets.
A straightforward way to check the equality of the two lists in Python is by using the equality == operator. When the equality == is used on the list type in Python, it returns True if the lists are equal and False if they are not.
check elements of a
and b
with this:
set(a).intersection(b)
example:
In [44]: nk=set(a).intersection(b)
In [45]: for x in a:
...: if x in nk:
...: print x, 'present in b'
...: else:
...: print x, 'absent in b'
...:
a absent in b
b absent in b
g present in b
r absent in b
also:
In [47]: timeit(set(a) & set(b))
1000000 loops, best of 3: 940 ns per loop
In [48]: timeit(set(a).intersection(b))
1000000 loops, best of 3: 854 ns per loop
In [49]: timeit([x for x in a if x in b])
1000000 loops, best of 3: 1 us per loop
hence use set(a).intersection(b)
.
You don't need to convert both list
s to set
s, just one. I think skipping the unnecessary conversion makes it more readable and elegant.
So, either:
set(a).intersection(b)
Or:
s = set(a)
any(e in s for e in b)
The latter has the advantage of short-circuiting as soon as it finds one match, and better expressing the logic, and returning True
or False
instead of a non-falsey or falsey set
, but it is two lines instead of one, if that bothers you. I have no idea whether this advantage cancels out the cost of putting the loop inside a generator expression instead of inside a C function.
Performance results with list
s this small are almost meaningless, so let's try this:
In [373]: a=[random.choice(string.ascii_lowercase) for _ in range(10000)]
In [374]: b=[random.choice(string.ascii_lowercase) for _ in range(10000)]
In [375]: %timeit(set(a))
10000 loops, best of 3: 180 us per loop
In [376]: s=set(a) # since all versions need to do this
In [391]: %timeit(s & set(b))
1000000 loops, best of 3: 178 us per loop
In [392]: %timeit(s.intersection(b))
1000000 loops, best of 3: 247 us per loop
In [393]: %timeit(discard(e in s for e in b))
1000000 loops, best of 3: 550 ns per loop
In [394]: %timeit(any(e in s for e in b))
1000000 loops, best of 3: 749 ns per loop
In [395]: %timeit(any(e in a for e in b))
1000000 loops, best of 3: 1.42 us per loop
To put the numbers all in the nanosecond scale, add back in the cost of the set(a)
that all but the last need, and compare the same tests from three Python versions (Apple stock CPython 2.7.2, python.org CPython 3.3.0, Homebrew PyPy 1.9.0/2.7.2, all 64-bit Mac builds):
CP272 CP330 PyPy
s & set(b) 358000 316000 180500
s.intersection(b) 427000 459000 180900
discard(genexp) 180550 157341 90094
any(genexp) 180749 157334 90208
any(list-genexp) 1420 686 960
Now that I think about it, this makes total sense. The odds of getting a collision pretty early on are very high, so the cost of converting the whole thing to a set dominates everything.
Which means we need a new test, with 10000 unique values. Let's repeat the test, with this:
In [29]: a, b = list(range(10000)), list(range(10000))
In [30]: random.shuffle(a)
In [31]: random.shuffle(b)
CP272 CP330 PyPy
s & set(b) 1277000 1168000 1141000
s.intersection(b) 1165000 1117000 2520000
discard(genexp) 1699000 1271000 770000
any(genexp) 389800 344543 320807
any(list-genexp) 62000 10400 1520
These are more reasonable. And they still make sense. If you're comparing the same 10000 elements randomly shuffled, how far into each do you have to go? Not far enough to make the cost of set
-ifying either of the lists worth doing, much less both of them!
So, let's try a case where there are no matches:
In [43]: a=list(range(10000, 20000))
CP272 CP330 PyPy
s & set(b) 751000 770000 733000
s.intersection(b) 466000 530000 1920000
discard(genexp) 1246000 985000 749000
any(genexp) 1269000 966000 893000
any(list-genexp) 185000000 176000000 5870000
I have no idea how PyPy did the last one so fast, but other than that, no surprises here.
So, which one is best?
Clearly, if you expect lots of collisions, you want to avoid making the sets whenever possible—but if you expect few collisions, you want to make at least one set. If you have no idea, I think the safest bet is any(genexp)
—at worst it's less than 3x as bad as the best, and if there's a chance the collision rate is high, it'll be a lot faster. But you can look at the numbers and see for yourself.
Or, better, of course, time them all against real test data you expect to encounter.
This works:
>>> l1,l2=['a', 'b', 'g', 'r'], ['e', 'g', 'l', 1, 'w']
>>> list(set(l1)&set(l2))
['g']
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With