I have a text like this:
hello world /* select a from table_b
*/ some other text with new line cha
racter and there are some blocks of
/* any string */ select this part on
ly
////RESULT rest string
The text is multilined and I need to extract from last occurrence of "*/" until "////RESULT". In this case, the result should be:
select this part on
ly
How to achieve this in perl?
I have attempted \\\*/(.|\n)*////RESULT
but that will start from first "*/"
A useful trick in cases like this is to prefix the regexp with the greedy pattern .*
, which will try to match as many characters as possible before the rest of the pattern matches. So:
my ($match) = ($string =~ m!^.*\*/(.*?)////RESULT!s);
Let's break this pattern into its components:
^.*
starts at the beginning of the string and matches as many characters as it can. (The s
modifier allows .
to match even newlines.) The beginning-of-string anchor ^
is not strictly necessary, but it ensures that the regexp engine won't waste too much time backtracking if the match fails.
\*/
just matches the literal string */
.
(.*?)
matches and captures any number of characters; the ?
makes it ungreedy, so it prefers to match as few characters as possible in case there's more than one position where the rest of the regexp can match.
Finally, ////RESULT
just matches itself.
Since the pattern contains a lot of slashes, and since I wanted to avoid leaning toothpick syndrome, I decided to use alternative regexp delimiters. Exclamation points (!
) are a popular choice, since they don't collide with any normal regexp syntax.
Edit: Per discussion with ikegami below, I guess I should note that, if you want to use this regexp as a sub-pattern in a longer regexp, and if you want to guarantee that the string matched by (.*?)
will never contain ////RESULT
, then you should wrap those parts of the regexp in an independent (?>)
subexpression, like this:
my $regexp = qr!\*/(?>(.*?)////RESULT)!s;
...
my $match = ($string =~ /^.*$regexp$some_other_regexp/s);
The (?>)
causes the pattern inside it to fail rather than accepting a suboptimal match (i.e. one that extends beyond the first substring matching ////RESULT
) even if that means that the rest of the regexp will fail to match.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With