How do I verify an exact word occurs in a string?
I need to account for cases when a word such as "king" has a question mark immediately following as in the example below.
unigrams this should be False
In [1]: answer = "king"
In [2]: context = "we run with the king? on sunday"
n_grams this should be False
In [1]: answer = "king tut"
In [2]: context = "we run with the king tut? on sunday"
unigrams this should be True
In [1]: answer = "king"
In [2]: context = "we run with the king on sunday"
n_grams this should be True
In [1]: answer = "king tut"
In [2]: context = "we run with the king tut on sunday"
As people mentioned, for the unigram case we can handle it by splitting the string into a list, but that doesn't work for n_grams.
After reading some posts, I think I should attempt to handle using a look behind, but I'm not sure.
return answer in context.split():
>>> answer in context.split()
False
You don't need a regex for this.
If you're looking for keywords:
all([ans in context.split() for ans in answer.split()])
will work with "king tut"
, but that depends if you want to match strings like:
"we tut with the king"
If you don't, you still don't need a regex (although you should probably use one), given that you want to consider only whole terms (which are properly split, by default, via .split()
):
def ngram_in(match, string):
matches = match.split()
if len(matches) == 1:
return matches[0] in string.split()
words = string.split()
words_len = len(words)
matches_len = len(matches)
for index, word in enumerate(words):
if index + matches_len > words_len:
return False
if word == matches[0]:
for match_index, match in enumerate(matches):
potential_match = True
if words[index + match_index] != match:
potential_match = False
break
if potential_match == True:
return True
return False
which is O(n*m)
on a worst case string and about half as fast as a regex on normal strings.
>>> ngram_in("king", "was king tut a nice dude?")
True
>>> ngram_in("king", "was king? tut a nice dude?")
False
>>> ngram_in("king tut a", "was king tut a nice dude?")
True
>>> ngram_in("king tut a", "was king tut? a nice dude?")
False
>>> ngram_in("king tut a", "was king tut an nice dude?")
False
>>> ngram_in("king tut", "was king tut an nice dude?")
True
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With