I have this script that does a word search in text. The search goes pretty good and results work as expected. What I'm trying to achieve is extract n
words close to the match. For example:
The world is a small place, we should try to take care of it.
Suppose I'm looking for place
and I need to extract the 3 words on the right and the 3 words on the left. In this case they would be:
left -> [is, a, small]
right -> [we, should, try]
What is the best approach to do this?
Thanks!
def search(text,n):
'''Searches for text, and retrieves n words either side of the text, which are retuned seperatly'''
word = r"\W*([\w]+)"
groups = re.search(r'{}\W*{}{}'.format(word*n,'place',word*n), text).groups()
return groups[:n],groups[n:]
This allows you to specify how many words either side you want to capture. It works by constructing the regular expression dynamically. With
t = "The world is a small place, we should try to take care of it."
search(t,3)
(('is', 'a', 'small'), ('we', 'should', 'try'))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With