Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regular expression for start and end of string in multiline mode

Tags:

regex

perl

pcre

In a regular expression, in multiline mode, ^ and $ stand for the start and end of line. How can I match the end of the whole string?

In the string

Hello\nMary\nSmith\nHello\nJim\nDow

the expression

/^Hello(?:$).+?(?:$).+?$/ms

matches Hello\nMary\Smith.

I wonder whether there is a metacharacter (like \ENDSTRING) that matches the end of the whole string, not just line, such that

/^Hello(?:$).+?(?:$).+?\ENDSTRING/ms

would match Hello\nJim\nDow. Similarly, a metacharacter to match the start of the whole string, not a line.

like image 723
Alexander Gelbukh Avatar asked Oct 08 '17 00:10

Alexander Gelbukh


People also ask

What starts and ends a multiline string?

A multiline string in Python begins and ends with either three single quotes or three double quotes. Any quotes, tabs, or newlines in between the “triple quotes” are considered part of the string.

How do I specify start and end in RegEx?

To match the start or the end of a line, we use the following anchors: Caret (^) matches the position before the first character in the string. Dollar ($) matches the position right after the last character in the string.

What is multiline mode in RegEx?

Multiline option, it matches either the newline character ( \n ) or the end of the input string. It does not, however, match the carriage return/line feed character combination.

How do you specify the end of a string in RegEx?

End of String or Line: $ The $ anchor specifies that the preceding pattern must occur at the end of the input string, or before \n at the end of the input string. If you use $ with the RegexOptions. Multiline option, the match can also occur at the end of a line.


1 Answers

There are indeed assertions (perlre) for that

\A Match only at beginning of string
\Z Match only at end of string, or before newline at the end

...
The \A and \Z are just like ^ and $, except that they won't match multiple times when the /m modifier is used, while ^ and $ will match at every internal line boundary. To match the actual end of the string and not ignore an optional trailing newline, use \z.

Also see Assertions in perlbackslash.

I am not sure what you're after in the shown example so here is another one

perl -wE'$_ = qq(one\ntwo\nthree); say for /(\w+\n\w+)\Z/m'

prints

two
three

while with $ instead of \Z it prints

one
two

Note that the above example would match qq(one\ntwo\three\n) as well (with a trailing newline), what may or may not be suitable. Please compare \Z and \z from the above quote for your actual needs. Thanks to ikegami for a comment.

like image 132
zdim Avatar answered Sep 19 '22 17:09

zdim