I have sets of strings
set_a = {'abcd', 'efgh', 'ghij'}
set_b = {'abce', 'efgk', 'ghij'}
I want to find the intersection between these two sets but element equality defined as the following
def match(string_a, string_b, threshold=0.8):
lcs_len = lcs(string_a, item_set_b)
return (lcs_len / max(len(string_a), len(item_set_b))) > 0.8
basically if the lcs is at least 80% of the length of the string we consider this matching "enough". I know passing custom comparators to sorting methods works something like this but I haven't found anything for custom comparators in set operations.
You can iterate over the cartesian product of both sets, then keep the elements that are in both sets and satisfy your predicate
from itertools import product
{i for i,j in product(set_a, set_b) if i in set_b and match(i,j)}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With