How can I match the content between a startlabel and either a empty line or an endlabel with a regex?
For example regex101 link:
<START> some text is here.
more text
unrelated text
<START> even more text.
text text
<STOP>
It should match two matches
<START> some text is here.
more text
and
<START> even more text.
text text
<STOP>
The regex I came up so far is as follows (but it matches the whole text, I assume because of the (?s).* part).
<START>((?s).*)(\s\s|<STOP>)
Save this question. . means match any character in regular expressions. * means zero or more occurrences of the SINGLE regex preceding it. My alphabet.txt contains a line abcdefghijklmnopqrstuvwxyz.
To match the start or the end of a line, we use the following anchors: Caret (^) matches the position before the first character in the string. Dollar ($) matches the position right after the last character in the string.
An empty regular expression matches everything.
represents a single character (like the regex's . ) while * represents a sequence of zero or more characters (equivalent to regex . * ).
You should use a lazy quantifier for .*
to match as few as it can. Using .*?
:
(?s)(<START>.*?)(?:(?:\r*\n){2}|<STOP>)
Leaving out of the group what you specified as ending conditions.
(?:\r*\n){2}
an empty line.<STOP>
the end label.DEMO
You can design your pattern like this (with the modifier m):
<START>[^\n<]*(?:(?:<(?!STOP>)|\n(?!$))[^\n<]*)*(?:<STOP>|\n$|\z)
demo
The idea is to match all that is not a <
or a newline with [^\n<]*
. When a <
or a newline is reached, negative lookaheads check if they are not followed by "STOP>"
or an end of line. If the negative lookahead succeeds then [^\n<]*
(in the non-capturing group this time) reaches the next <
or newline. The group is repeated until <STOP>
, two newlines, the end of the string.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With