I know $ is used to check if a line end follows in a Java regular expression.
For the following codes:
String test_domain = "http://www.google.com/path\nline2\nline3";
test_domain = test_domain.replaceFirst("(\\.[^:/]+).*$?", "$1");
System.out.println(test_domain);
The output is:
http://www.google.com
line2
line3
I assume that the pattern (\\.[^:/]+).*$? matches the first line, which is http://www.google.com/path, and the $1 is http://www.google.com. The ? makes a reluctant match (so matches the first line.)
However, if I remove the ? in the pattern and implement following codes:
String test_domain = "http://www.google.com/path\nline2\nline3";
test_domain = test_domain.replaceFirst("(\\.[^:/]+).*$", "$1");
System.out.println(test_domain);
The output is:
http://www.google.com/path
line2
line3
I think it should give out the result http://www.google.com
(\\.[^:/]+) matches http://www.google.com .*$ matches /path\nline2\nline3Where is my misunderstanding of the regex here?
You have a multiline input and trying to use anchor $ in your regex for each line but not using MULTILINE flag. All you need is (?m) mode in front of your regex:
String test_domain = "http://www.google.com/path\nline2\nline3";
test_domain = test_domain.replaceFirst("(?m)(\\.[^:/]+).*$", "$1");
System.out.println(test_domain);
This will output:
http://www.google.com
line2
line3
RegEx Demo
Without MULTILINE or DOTALL modes your regex: (\.[^:/]+).*$ will fail to match the input due to presence of .*$ since dot will not match newlines and $ (end of line) is present after 2 newlines.
Your regex does not match the input string. In fact, $ matches the end of string (at the end of line3). Since you are not using an s flag, the . cannot get there.
NOTE! that the $ anchor - even without Pattern.MULTILINE option - can match a position before the final line feed char, see What is the difference between ^ and \A , $ and \Z in regex?. This can be easily tested with "a\nb\n".replaceAll("$", "X"), resulting in "a\nbX\nX", see this Java demo.
More, the $ end of line/string anchor cannot have ? quantifier after it. It makes no sense for the regex engine, and is ignored in Java.
To make it work at all, you need to use s flag if you want to just return http://www.google.com:
String test_domain = "http://www.google.com/path\nline2\nline3";
test_domain = test_domain.replaceFirst("(?s)(\\.[^:/]+).*$", "$1");
System.out.println(test_domain);
Output of this demo:
http://www.google.com
With a multiline (?m) flag, the regex will process each line looking for a literal . and then a sequence of characters other than : and /. When one of these characters is found, the rest of characters on that line will be omitted.
String test_domain = "http://www.google.com/path\nline2\nline3";
test_domain = test_domain.replaceFirst("(?m)(\\.[^:/]+).*$", "$1");
System.out.println(test_domain);
Output of this IDEONE demo:
http://www.google.com
line2
line3
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With