Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python regex selection of verbs with present perfect

In a given string, I am trying to catch verbs that are in present pefect tense. I do that by using the following regular expression in python:

import re
sentence = "The Batman has never shown his true identity but has done so much good for Gotham City"

verb = re.findall(r'has\s[^\,\.\"]{0,50}done', sentence)

And the outcome is:

>>> print(verb)

['has never shown his true identity but has done']

Here, the correct answer would have been 'has done', but the 'has' from 'has never shown' is the wrong 'has' catched. The part [^\,\.\"]{0,50} permits some freedom with respect to what is between 'has' and 'done', which does not appear here but is useful on my real data. However, it catches the first 'has' it finds, which is not always the good one. Is it possible to take the last 'has' instead ?

like image 798
krasnapolsky Avatar asked Mar 25 '26 17:03

krasnapolsky


1 Answers

You can use a tempered greedy token solution here:

\bhas\s(?:(?!\bhas\b)[^,."]){0,50}?\bdone\b

See the regex demo.

Details

  • \bhas - a whole word has
  • \s - one whitespace char
  • (?:(?!\bhas\b)[^,."]){0,50}? - any char but ,, . or ", zero to fifty occurrences but as few as possible, that does not start a whole word has
  • \bdone\b - a whole word done.

See a Python demo:

import re
sentence = "The Batman has never shown his true identity but has done so much good for Gotham City"
verb = re.findall(r'\bhas\s(?:(?!\bhas\b)[^,."]){0,50}?\bdone\b', sentence)
print(verb)
# => ['has done']
like image 116
Wiktor Stribiżew Avatar answered Mar 27 '26 07:03

Wiktor Stribiżew



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!