I need to mark negative contexts in a sentence. The algorithm goes as follows:
Now, I have defined a regex to pick out all such occurences:
def replacenegation(text):
match=re.search(r"((\b(never|no|nothing|nowhere|noone|none|not|havent|hasnt|hadnt|cant|couldnt|shouldnt|wont|wouldnt|dont|doesnt|didnt|isnt|arent|aint)\b)|\b\w+n't\b)((?![.:;!?]).)*[.:;!?\b]", text)
if match:
s=match.group()
print s
news=""
wlist=re.split(r"[.:;!? ]" , s)
wlist=wlist[1:]
print wlist
for w in wlist:
if w:
news=news+" "+w+"_NEG"
print news
I can detect and replace the matched group. However, I don't know how to recreate the complete sentence after this operation. Also for multiple matches, match.groups() gives me wrong output.
For example, if my input sentence is:
I don't like you at all; I should not let you know my happiest secret.
Output should be:
I don't like_NEG you_NEG at_NEG all_NEG ; I should not let_NEG you_NEG know_NEG my_NEG happiest_NEG secret_NEG .
How do I do this?
First of all you better to change the negative look-ahead (?![.:;!?]).)*
to a negated character class.
([^.:;!?]*)
Then you need to use none capture group and remove the extra ones for your negative words because you have surrounded it by 3 capture group, it will returns 3 match of your negative words like not
. then you can use re.findall()
to find all the matches:
>>> regex =re.compile(r"((?:never|no|nothing|nowhere|noone|none|not|havent|hasnt|hadnt|cant|couldnt|shouldnt|wont|wouldnt|dont|doesnt|didnt|isnt|arent|aint)\b|\b\w+n't\b)([^.:;!?]*)([.:;!?\b])")
>>>
>>> regex.findall(s)
[("don't", ' like you at all', ';'), ('not', ' let you know my happiest secret', '.')]
Or for replacing the words you can use re.sub
with a lambda function as the replacer:
>>> regex.sub(lambda x:x.group(1)+' '+' '.join([i+'_NEG' for i in x.group(2).split()])+x.group(3) ,s)
"I don't like_NEG you_NEG at_NEG all_NEG; I should not let_NEG you_NEG know_NEG my_NEG happiest_NEG secret_NEG."
Note that for capturing the punctuation you need to put it to a capture group too. Then you can add it at the end of your sentences in re.sub()
after edit.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With