I have two sets:
a = set(['this', 'is', 'an', 'apple!'])
b = set(['apple', 'orange'])
I want to find if there are any (b) in (a) including substrings. normally I would do:
c = a.intersection(b)
However, in this example it would return an empty set as 'apple' != 'apple!'
Assuming I cannot remove characters from (a) and hopefully without creating loops, is there a way for me to find a match?
Edit: I would like for it to return a match from (b) e.g. I would like to know if 'apple' is in set (a), I do not want it to return 'apple!'
Instead of doing the equality check via ==
, you can use in
for substring match which also covers equality:
>>> [x for ele in a for x in b if x in ele]
["apple"]
Using sets is actually of little benefit if you are not searching for exact matches, if the words always start with the same substring, sorting and bisecting will be a more efficient approach i.e O(n log n)
vs O(n^2)
:
a = set(['this', 'is', 'an', 'apple!'])
b = set(['apple', 'orange'])
srt = sorted(a)
from bisect import bisect
inter = [word for word in b if srt[bisect(srt, word, hi=len(a))].startswith(word)]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With