How can I match the content between a startlabel and either a empty line or an endlabel with a regex? For example regex101 link: <pre class="prettyprint"><code><START> some text is here. more text unrelated text <START> even more text. text text <STOP> </code></pre> It should match two matches <pre class="prettyprint"><code><START> some text is here. more text </code></pre> and <pre class="prettyprint"><code><START> even more text. text text <STOP> </code></pre> The regex I came up so far is as follows (but it matches the whole text, I assume because of the (?s).* part). <pre class="prettyprint"><code><START>((?s).*)(\s\s|<STOP>) </code></pre>

You should use a lazy quantifier for <code>.*</code> to match as few as it can. Using <code>.*?</code>: <pre class="prettyprint"><code>(?s)(<START>.*?)(?:(?:\r*\n){2}|<STOP>) </code></pre> Leaving out of the group what you specified as ending conditions. <ol> <li> <code>(?:\r*\n){2}</code> an empty line.</li> <li> <code><STOP></code> the end label.</li> </ol> DEMO

Regex match from start label until empty line or end label

Tags:

regex

How can I match the content between a startlabel and either a empty line or an endlabel with a regex?

For example regex101 link:

<START> some text is here. 
more text

unrelated text

<START> even more text. 
text text
<STOP>

It should match two matches

<START> some text is here. 
more text

and

<START> even more text. 
text text
<STOP>

The regex I came up so far is as follows (but it matches the whole text, I assume because of the (?s).* part).

<START>((?s).*)(\s\s|<STOP>)

412

asked Sep 21 '15 22:09

tkja

2 Answers

You should use a lazy quantifier for .* to match as few as it can. Using .*?:

(?s)(<START>.*?)(?:(?:\r*\n){2}|<STOP>)

Leaving out of the group what you specified as ending conditions.

(?:\r*\n){2} an empty line.
<STOP> the end label.

DEMO

198

answered Oct 22 '22 21:10

Mariano

You can design your pattern like this (with the modifier m):

<START>[^\n<]*(?:(?:<(?!STOP>)|\n(?!$))[^\n<]*)*(?:<STOP>|\n$|\z)

demo

The idea is to match all that is not a < or a newline with [^\n<]*. When a < or a newline is reached, negative lookaheads check if they are not followed by "STOP>" or an end of line. If the negative lookahead succeeds then [^\n<]* (in the non-capturing group this time) reaches the next < or newline. The group is repeated until <STOP>, two newlines, the end of the string.

answered Oct 22 '22 23:10

Casimir et Hippolyte

Related questions
                            
                                What does the "~" character signify in PHP regex?
                            
                                How to say not an empty string in MYSQL with Regular Expression
                            
                                R intersecting strings [duplicate]
                            
                                python re ?: example [duplicate]
                            
                                Getting form "action" from BeautifulSoup result
                            
                                Find & Replace digit by digit and space in Sublime Text
                            
                                How to do a String.Replace in LINQ?
                            
                                Trying to write REGEX for username validation in Rails
                            
                                What's the difference between "(ex1)|(ex2)|(ex3)" and "[(ex1)(ex2)(ex3)]"
                            
                                How can I grep for multiple patterns at once?
                            
                                Sed error "\1 not defined in the RE" on MacOSX 10.9.5
                            
                                What is the xpath regex to extract this meta tag?
                            
                                R: (*SKIP)(*FAIL) for multiple patterns
                            
                                find words of length 4 using regular expression
                            
                                python RE findall() return value is an entire string
                            
                                Creating regex to extract 4 digit number from string using java
                            
                                How can I convert Degree minute sec to Decimal in R?
                            
                                Switch/case statement
                            
                                Microsoft Edge regex for user agent
                            
                                List files on HTTP/FTP server in R

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With