I'd like to match strings like:
45 meters?
45, meters?
45?
45 ?
but not strings like:
45 meters you?
45 you ?
45, and you?
In both cases the question mark must be at the end. So, essentially I want to exclude all those strings containing the word "you".
I've tried the following regex:
'\d+.*(?!you)\?$'
but it matches the second case (probably because of .*
)
There's a neat trick to exclude some matches from a regex, which you can use here:
>>> import re
>>> corpus = """
... 45 meters?
... 45?
... 45 ?
... 45 meters you?
... 45 you ?
... 45, and you?
... """
>>> pattern = re.compile(r"\d+[^?]*you|(\d+[^?]*\?)")
>>> re.findall(pattern, corpus)
['45 meters?', '45?', '45 ?', '', '', '']
The downside is that you get empty matches when the exclusion kicks in, but those are easily filtered out:
>>> filter(None, re.findall(pattern, corpus))
['45 meters?', '45?', '45 ?']
How it works:
The trick is that we only pay attention to captured groups ... so the left hand side of the alternation - \d+[^?]*you
(or "digits followed by non-?-characters followed by 'you'") matches what you don't want, and then we forget about it. Only if the left hand side doesn't match is the right hand side - (\d+[^?]*\?)
(or "digits followed by non-?-characters followed by '?') - matched, and that one is captured.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With