Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use beginning and endline markers in regex for Java String?

Why doesn't the following change the text for me in Android?

String content = "test\n=test=\ntest";
content = content.replaceAll("^=(.+)=$", "<size:large>$1</size:large>")

It returns the original value with no changes. I would expect it to replace the middle =test= with <size:large>test</size:large>

What am I missing here?

Edit: Okay, I understand why ^ and $ don't work. The point is that I need something that matches text both at the beginning and end of a line, e.g. a line that contains only "=some text=". Most of the answers given aren't sufficient, for the following reasons:

=(.+)= doesn't have anything to do with line endings, so matches any line with two = in it that are not side by side.

.*=(.+)=.* matches the whole line, but has the same problem as the previous

\n=(.+)=\n gets closer, but won't match two lines in a row (e.g. test\n=test=\n=test=\ntest) It also won't match an instance on the first or last line

(?<=\n)=(.+)=(?=\n) almost works, but again won't match an instance on the first or last line

(?<!.)=(.+)=(?!.) is the only one that seems will actually match every line that starts and ends with =, for example, but $1 contains both the replacement and the original string.

content = content.replaceAll("(?<=(\n|^))=(.+)=(?=(\n|$))", "<size:large>$2</size:large>"); is the only answer that seems to actually do what it should.

like image 498
noahy Avatar asked Aug 22 '12 16:08

noahy


People also ask

How do you search for a regex pattern at the beginning of a string?

The meta character “^” matches the beginning of a particular string i.e. it matches the first character of the string. For example, The expression “^\d” matches the string/line starting with a digit. The expression “^[a-z]” matches the string/line starting with a lower case alphabet.

How do you start and end a regular expression?

The caret ^ and dollar $ characters have special meaning in a regexp. They are called “anchors”. The caret ^ matches at the beginning of the text, and the dollar $ – at the end. The pattern ^Mary means: “string start and then Mary”.

Which regex special characters match the start of a string?

The position anchors ^ and $ match the beginning and the ending of the input string, respectively. That is, this regex shall match the entire input string, instead of a part of the input string (substring). \w+ matches 1 or more word characters (same as [a-zA-Z0-9_]+ ).


2 Answers

Your original regex works fine if you turn on multiline mode, using (?m):

content = content.replaceAll("(?m)^=(.+)=$", "<size:large>$1</size:large>");

Now ^ and $ do indeed match at line boundaries.

like image 51
Alan Moore Avatar answered Sep 28 '22 00:09

Alan Moore


The best way to deal with this is to set Pattern.MULTILINE. Using MULTILINE, ^ and $ will match on lines that are separated using only \n, and will similarly handle the beginning of input and the end of input.

Using String.replaceAll you need to set these within the pattern using an embedded flag expression (?m), for MULTILINE:

content = str.replaceAll("(?m)^=(.+)=$", "<size:large>$1</size:large>");

If you don't use MULTILINE, you need to use positive lookahead and lookbehind for the \n, and the regex gets complicated in order to match the first line, and the last line if there's no \n at the end, e.g. if our input is: =test=\n=test=\n=test=\n=test=.

String pattern = "(?<=(^|\n))=(.+)=(?=(\n|$))";
content = str.replaceAll(pattern, "<size:large>$2</size:large>");

In this pattern we're supplying options for the lookbehind: \n or beginning of input, (^|\n); and for the lookahead: \n or end of input, (\n|$). Notice that we need to use $2 as the captured group reference in the replacement because of the group introduced by the first or.

We can make the pattern more complicated by introducing the alternatives in the lookahead/lookbehind in non-capturing groups, which look like (?:):

String pattern = "(?<=(?:^|\n))=(.+)=(?=(?:\n|$))";
content = str.replaceAll(pattern, "<size:large>$1</size:large>");

Now we're back to using $1 as the captured group in the replacement.

like image 21
pb2q Avatar answered Sep 28 '22 01:09

pb2q