Here is the example:
a = "one two three four five six one three four seven two"
m = re.search("one.*four", a)
What I want is to find the substring from "one" to "four" that doesn't contain the substring "two" in between. The answer should be: m.group(0) = "one three four", m.start() = 28, m.end() = 41
Is there a way to do this with one search line?
You can use this pattern:
one(?:(?!two).)*four
Before matching any additional character we check we are not starting to match "two".
Working example: http://regex101.com/r/yY2gG8
You can use the negative lookahead assertion (?!...)
:
re.findall("one(?!.*two).*four", a)
With the harder string Satoru added, this works:
>>> import re
>>> a = "one two three four five six one three four seven two"
>>> re.findall("one(?!.*two.*four).*four", a)
['one three four']
But - someday - you're really going to regret writing tricky regexps. If this were a problem I needed to solve, I'd do it like this:
for m in re.finditer("one.*?four", a):
if "two" not in m.group():
break
It's tricky enough that I'm using a minimal match there (.*?
). Regexps can be a real pain :-(
EDIT: LOL! But the messier regexp at the top fails yet again if you make the string harder still:
a = "one two three four five six one three four seven two four"
FINALLY: here's a correct solution:
>>> a = 'one two three four five six one three four seven two four'
>>> m = re.search("one([^t]|t(?!wo))*four", a)
>>> m.group()
'one three four'
>>> m.span()
(28, 42)
I know you said you wanted m.end()
to be 41, but that was incorrect.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With