Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python intersection with substrings

I have two sets:

a = set(['this', 'is', 'an', 'apple!'])
b = set(['apple', 'orange'])

I want to find if there are any (b) in (a) including substrings. normally I would do:

c = a.intersection(b)

However, in this example it would return an empty set as 'apple' != 'apple!'

Assuming I cannot remove characters from (a) and hopefully without creating loops, is there a way for me to find a match?

Edit: I would like for it to return a match from (b) e.g. I would like to know if 'apple' is in set (a), I do not want it to return 'apple!'

like image 634
brian4342 Avatar asked May 29 '16 22:05

brian4342


2 Answers

Instead of doing the equality check via ==, you can use in for substring match which also covers equality:

>>> [x for ele in a for x in b if x in ele]
["apple"]
like image 96
Ozgur Vatansever Avatar answered Sep 20 '22 13:09

Ozgur Vatansever


Using sets is actually of little benefit if you are not searching for exact matches, if the words always start with the same substring, sorting and bisecting will be a more efficient approach i.e O(n log n) vs O(n^2):

a = set(['this', 'is', 'an', 'apple!'])
b = set(['apple', 'orange'])

srt = sorted(a)
from bisect import bisect

inter = [word for word in b if srt[bisect(srt, word, hi=len(a))].startswith(word)]
like image 45
Padraic Cunningham Avatar answered Sep 22 '22 13:09

Padraic Cunningham