Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex - Get string between two words that doesn't contain word

I've been looking around and could not make this happen. I am not totally noob.

I need to get text delimited by (including) START and END that doesn't contain START. Basically I can't find a way to negate a whole word without using advanced stuff.

Example string:

abcSTARTabcSTARTabcENDabc

The expected result:

STARTabcEND

Not good:

STARTabcSTARTabcEND

I can't use backward search stuff. I am testing my regex here: www.regextester.com

Thanks for any advice.

like image 557
rrr Avatar asked Sep 07 '11 11:09

rrr


2 Answers

Try this

START(?!.*START).*?END

See it here online on Regexr

(?!.*START) is a negative lookahead. It ensures that the word "START" is not following

.*? is a non greedy match of all characters till the next "END". Its needed, because the negative lookahead is just looking ahead and not capturing anything (zero length assertion)

Update:

I thought a bit more, the solution above is matching till the first "END". If this is not wanted (because you are excluding START from the content) then use the greedy version

START(?!.*START).*END

this will match till the last "END".

like image 184
stema Avatar answered Oct 16 '22 06:10

stema


START(?:(?!START).)*END

will work with any number of START...END pairs. To demonstrate in Python:

>>> import re
>>> a = "abcSTARTdefENDghiSTARTjlkENDopqSTARTrstSTARTuvwENDxyz"
>>> re.findall(r"START(?:(?!START).)*END", a)
['STARTdefEND', 'STARTjlkEND', 'STARTuvwEND']

If you only care for the content between START and END, use this:

(?<=START)(?:(?!START).)*(?=END)

See it here:

>>> re.findall(r"(?<=START)(?:(?!START).)*(?=END)", a)
['def', 'jlk', 'uvw']
like image 32
Tim Pietzcker Avatar answered Oct 16 '22 04:10

Tim Pietzcker