From http://java.sun.com/j2se/1.5.0/docs/api/java/util/regex/Pattern.html:
\Z The end of the input but for the final terminator, if any
\z The end of the input
But what does it mean in practice? Can you give me an example when I use either the \Z or \z.
In my test I thought that "StackOverflow\n".matches("StackOverflow\\z")
will return true and "StackOverflow\n".matches("StackOverflow\\Z")
returns false. But actually both return false. Where is the mistake?
represents any single character (usually excluding the newline character), while * is a quantifier meaning zero or more of the preceding regex atom (character or group). ? is a quantifier meaning zero or one instances of the preceding atom, or (in regex variants that support it) a modifier that sets the quantifier ...
The subexpression/metacharacter “\Z” matches the end of the entire string except allowable final line terminator.
A regular expression (shortened as regex or regexp; sometimes referred to as rational expression) is a sequence of characters that specifies a search pattern in text. Usually such patterns are used by string-searching algorithms for "find" or "find and replace" operations on strings, or for input validation.
Even though
\Z
and$
only match at the end of the string (when the option for the caret and dollar to match at embedded line breaks is off), there is one exception. If the string ends with a line break, then\Z
and$
will match at the position before that line break, rather than at the very end of the string.This "enhancement" was introduced by Perl, and is copied by many regex flavors, including Java, .NET and PCRE. In Perl, when reading a line from a file, the resulting string will end with a line break. Reading a line from a file with the text "joe" results in the string joe\n. When applied to this string, both
^[a-z]+$
and\A[a-z]+\Z
will match "joe".If you only want a match at the absolute very end of the string, use
\z
(lower case z instead of upper case Z).\A[a-z]+\z
does not match joe\n.\z
matches after the line break, which is not matched by the character class.
http://www.regular-expressions.info/anchors.html
The way I read this "StackOverflow\n".matches("StackOverflow\\z")
should return false because your pattern does not include the newline.
"StackOverflow\n".matches("StackOverflow\\z\\n") => false
"StackOverflow\n".matches("StackOverflow\\Z\\n") => true
Just checked it. It looks like when Matcher.matches() is invoked(like in your code, behind the scenes), \Z behaves like \z. However, when Matcher.find() is invoked, they behave differently as expected. The following returns true:
Pattern p = Pattern.compile("StackOverflow\\Z");
Matcher m = p.matcher("StackOverflow\n");
System.out.println(m.find());
and if you replace \Z with \z it returns false.
I find this a little surprising...
I think the main problem here is the unexpected behavior of matches()
: any match must consume the whole input string. Both of your examples fail because the regexes don't consume the linefeed at the end of the string. The anchors have nothing to do with it.
In most languages, a regex match can occur anywhere, consuming all, some, or none of the input string. And Java has a method, Matcher#find()
, that performs this traditional kind of match. However, the results are the opposite of what you said you expected:
Pattern.compile("StackOverflow\\z").matcher("StackOverflow\n").find() //false
Pattern.compile("StackOverflow\\Z").matcher("StackOverflow\n").find() //true
In the first example, the \z
needs to match the end of the string, but the trailing linefeed is in the way. In the second, the \Z
matches before the linefeed, which is at the end of the string.
I think Alan Moore provided the best answer, especially the crucial point that matches
silently inserts ^
and $
into its regex argument.
I'd also like to add a few examples. And a little more explanation.
\z
matches only at the very end of the string.
\Z
also matches at the very end of the string, but if there's a \n
, it will match before it.
Consider this program:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Main {
public static void main(String[] args) {
Pattern p = Pattern.compile(".+\\Z"); // some word before the end of the string
String text = "one\ntwo\nthree\nfour\n";
Matcher m = p.matcher(text);
while (m.find()) {
System.out.println(m.group());
}
}
}
It will find 1 match, and print "four"
.
Change \Z
to \z
, and it will not match anything, because it doesn't want to match before the \n
.
However, this will also print four
, because there's no \n
at the end:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Main {
public static void main(String[] args) {
Pattern p = Pattern.compile(".+\\z");
String text = "one\ntwo\nthree\nfour";
Matcher m = p.matcher(text);
while (m.find()) {
System.out.println(m.group());
}
}
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With