Whats the difference between \z and \Z in a regular expression and when and how do I use it?

Tags:

regex

From http://java.sun.com/j2se/1.5.0/docs/api/java/util/regex/Pattern.html:

\Z  The end of the input but for the final terminator, if any
\z  The end of the input

But what does it mean in practice? Can you give me an example when I use either the \Z or \z.

In my test I thought that "StackOverflow\n".matches("StackOverflow\\z") will return true and "StackOverflow\n".matches("StackOverflow\\Z") returns false. But actually both return false. Where is the mistake?

589

asked Apr 25 '10 10:04

Mister M. Bean

4 Answers

Even though \Z and $ only match at the end of the string (when the option for the caret and dollar to match at embedded line breaks is off), there is one exception. If the string ends with a line break, then \Z and $ will match at the position before that line break, rather than at the very end of the string.

This "enhancement" was introduced by Perl, and is copied by many regex flavors, including Java, .NET and PCRE. In Perl, when reading a line from a file, the resulting string will end with a line break. Reading a line from a file with the text "joe" results in the string joe\n. When applied to this string, both ^[a-z]+$ and \A[a-z]+\Z will match "joe".

If you only want a match at the absolute very end of the string, use \z (lower case z instead of upper case Z). \A[a-z]+\z does not match joe\n. \z matches after the line break, which is not matched by the character class.

http://www.regular-expressions.info/anchors.html

The way I read this "StackOverflow\n".matches("StackOverflow\\z") should return false because your pattern does not include the newline.

"StackOverflow\n".matches("StackOverflow\\z\\n") => false
"StackOverflow\n".matches("StackOverflow\\Z\\n") => true

155

answered Oct 16 '22 14:10

Jakob Kruse

Just checked it. It looks like when Matcher.matches() is invoked(like in your code, behind the scenes), \Z behaves like \z. However, when Matcher.find() is invoked, they behave differently as expected. The following returns true:

Pattern p = Pattern.compile("StackOverflow\\Z");
Matcher m = p.matcher("StackOverflow\n");
System.out.println(m.find());

and if you replace \Z with \z it returns false.

I find this a little surprising...

answered Oct 16 '22 13:10

Eyal Schneider

I think the main problem here is the unexpected behavior of matches(): any match must consume the whole input string. Both of your examples fail because the regexes don't consume the linefeed at the end of the string. The anchors have nothing to do with it.

In most languages, a regex match can occur anywhere, consuming all, some, or none of the input string. And Java has a method, Matcher#find(), that performs this traditional kind of match. However, the results are the opposite of what you said you expected:

Pattern.compile("StackOverflow\\z").matcher("StackOverflow\n").find()  //false
Pattern.compile("StackOverflow\\Z").matcher("StackOverflow\n").find()  //true

In the first example, the \z needs to match the end of the string, but the trailing linefeed is in the way. In the second, the \Z matches before the linefeed, which is at the end of the string.

answered Oct 16 '22 13:10

Alan Moore

I think Alan Moore provided the best answer, especially the crucial point that matches silently inserts ^ and $ into its regex argument.

I'd also like to add a few examples. And a little more explanation.

\z matches only at the very end of the string.

\Z also matches at the very end of the string, but if there's a \n, it will match before it.

Consider this program:

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Main {
    public static void main(String[] args) {
        Pattern p = Pattern.compile(".+\\Z"); // some word before the end of the string
        String text = "one\ntwo\nthree\nfour\n";
        Matcher m = p.matcher(text);
        while (m.find()) {
            System.out.println(m.group());
        }
    }
}

It will find 1 match, and print "four".

Change \Z to \z, and it will not match anything, because it doesn't want to match before the \n.

However, this will also print four, because there's no \n at the end:

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Main {
    public static void main(String[] args) {
        Pattern p = Pattern.compile(".+\\z");
        String text = "one\ntwo\nthree\nfour";
        Matcher m = p.matcher(text);
        while (m.find()) {
            System.out.println(m.group());
        }
    }
}

answered Oct 16 '22 14:10

pavel_orekhov

Related questions
                            
                                Header in the response must not be the wildcard '*' when the request's credentials mode is 'include'
                            
                                Hibernate Mapping Package
                            
                                What is the purpose of setDoInput and setDoOutput in Java HttpURLConnection?
                            
                                Handling custom error response in JAX-RS 2.0 client library
                            
                                How to split long strings in IntelliJ IDEA automatically?
                            
                                How can you replicate Hibernate's saveOrUpdate in JPA?
                            
                                Is it true that having lots of small methods helps the JIT compiler optimize?
                            
                                Is there a class like Optional but for non-optionals?
                            
                                Singleton in Cluster environment
                            
                                Can I invoke a java method other than main(String[]) from the command line?
                            
                                Nested/Inner class in external file
                            
                                Differences between IntelliJ IDEA 13 and Android Studio
                            
                                How to reference a generic return type with multiple bounds
                            
                                Eclipse + Java 8 support?
                            
                                Codahale Metrics: using @Timed metrics annotation in plain Java
                            
                                Managing several versions of serialized Java objects
                            
                                Curly braces in "new" expression? (e.g. "new MyClass() { ... }")
                            
                                Difference between Throws in method signature and Throw Statements in Java
                            
                                Why doesn't String toCharArray use Arrays.copyOf?
                            
                                Exception handling in ThreadPools

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With