I've been looking around and could not make this happen. I am not totally noob.
I need to get text delimited by (including) START and END that doesn't contain START. Basically I can't find a way to negate a whole word without using advanced stuff.
Example string:
abcSTARTabcSTARTabcENDabc
The expected result:
STARTabcEND
Not good:
STARTabcSTARTabcEND
I can't use backward search stuff. I am testing my regex here: www.regextester.com
Thanks for any advice.
Try this
START(?!.*START).*?END
See it here online on Regexr
(?!.*START)
is a negative lookahead. It ensures that the word "START" is not following
.*?
is a non greedy match of all characters till the next "END". Its needed, because the negative lookahead is just looking ahead and not capturing anything (zero length assertion)
Update:
I thought a bit more, the solution above is matching till the first "END". If this is not wanted (because you are excluding START from the content) then use the greedy version
START(?!.*START).*END
this will match till the last "END".
START(?:(?!START).)*END
will work with any number of START...END
pairs. To demonstrate in Python:
>>> import re
>>> a = "abcSTARTdefENDghiSTARTjlkENDopqSTARTrstSTARTuvwENDxyz"
>>> re.findall(r"START(?:(?!START).)*END", a)
['STARTdefEND', 'STARTjlkEND', 'STARTuvwEND']
If you only care for the content between START
and END
, use this:
(?<=START)(?:(?!START).)*(?=END)
See it here:
>>> re.findall(r"(?<=START)(?:(?!START).)*(?=END)", a)
['def', 'jlk', 'uvw']
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With