Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python find n-sized window around phrase within string

Tags:

python

I have a string, for example 'i cant sleep what should i do'as well as a phrase that is contained in the string 'cant sleep'. What I am trying to accomplish is to get an n sized window around the phrase even if there isn't n words on either side. So in this case if I had a window size of 2 (2 words on either size of the phrase) I would want 'i cant sleep what should'.

This is my current solution attempting to find a window size of 2, however it fails when the number of words to the left or right of the phrase is less than 2, I would also like to be able to use different window sizes.

import re
sentence = 'i cant sleep what should i do'
phrase = 'cant sleep'
words = re.findall(r'\w+', sentence)
phrase_words = re.findall(r'\w+', phrase)
print sentence_words[left-2:right+3]

left = sentence_words.index(span_words[0]) 
right =  sentence_words.index(span_words[-1])
print sentence_words[left-2:right+3]
like image 701
GNMO11 Avatar asked Dec 15 '22 11:12

GNMO11


2 Answers

You can use the partition method for a non-regex solution:

>>> s='i cant sleep what should i do'
>>> p='cant sleep'
>>> lh, _, rh = s.partition(p)

Then use a slice to get up to two words:

>>> n=2
>>> ' '.join(lh.split()[:n]), p, ' '.join(rh.split()[:n])
('i', 'cant sleep', 'what should')

Your exact output:

>>> ' '.join(lh.split()[:n]+[p]+rh.split()[:n])
'i cant sleep what should'

You would want to check whether p is in s or if the partition succeeds of course.


As pointed out in comments, lh should be a negative to take the last n words (thanks Mathias Ettinger):

>>> s='w1 w2 w3 w4 w5 w6 w7 w8 w9'
>>> p='w4 w5'
>>> n=2
>>> ' '.join(lh.split()[-n:]+[p]+rh.split()[:n])
'w2 w3 w4 w5 w6 w7'
like image 93
dawg Avatar answered Dec 31 '22 04:12

dawg


If you define words being entities separated by spaces you can split your sentences and use regular python slicing:

def get_window(sentence, phrase, window_size):
    sentence = sentence.split()
    phrase = phrase.split()
    words = len(phrase)

    for i,word in enumerate(sentence):
        if word == phrase[0] and sentence[i:i+words] == phrase:
            start = max(0, i-window_size)
            return ' '.join(sentence[start:i+words+window_size])

sentence = 'i cant sleep what should i do'
phrase = 'cant sleep'
print(get_window(sentence, phrase, 2))

You can also change it to a generator by changing return to yield and be able to generate all windows if several match of phrase are in sentence:

>>> list(gen_window('I dont need it, I need to get rid of it', 'need', 2))
['I dont need it, I', 'it, I need to get']
like image 41
301_Moved_Permanently Avatar answered Dec 31 '22 04:12

301_Moved_Permanently