Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is this a bug in the Java regexp implementation?

Tags:

java

regex

I'm trying to match the string iso_schematron_skeleton_for_xslt1.xsl against the regexp ([a-zA-Z|_])?(\w+|_|\.|-)+(@\d{4}-\d{2}-\d{2})?\.yang.

The expected result is false, it should not match.

The problem is that the call to matcher.matches() never returns.

Is this a bug in the Java regexp implementation?

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class HelloWorld{
    private static final Pattern YANG_MODULE_RE = Pattern
            .compile("([a-zA-Z|_])?(\\w+|_|\\.|-)+(@\\d{4}-\\d{2}-\\d{2})?\\.yang");

     public static void main(String []args){
        final Matcher matcher = YANG_MODULE_RE.matcher("iso_schematron_skeleton_for_xslt1.xsl");
        System.out.println(Boolean.toString( matcher.matches()));
     }
}

I'm using:

openjdk version "1.8.0_181"
OpenJDK Runtime Environment (build 1.8.0_181-b15)
OpenJDK 64-Bit Server VM (build 25.181-b15, mixed mode)
like image 211
rkosegi Avatar asked Oct 26 '18 08:10

rkosegi


People also ask

Is Java regex pattern thread safe?

The Regex class itself is thread safe and immutable (read-only). That is, Regex objects can be created on any thread and shared between threads; matching methods can be called from any thread and never alter any global state.

How do you check if a string matches a regex in Java?

Variant 1: String matches() This method tells whether or not this string matches the given regular expression. An invocation of this method of the form str. matches(regex) yields exactly the same result as the expression Pattern. matches(regex, str).

Does regex work in Java?

Regular expressions can be used to perform all types of text search and text replace operations. Java does not have a built-in Regular Expression class, but we can import the java.util.regex package to work with regular expressions.

Is regex fast in Java?

Regex is faster for large string than an if (perhaps in a for loops) to check if anything matches your requirement. If you are using regex as to match very small text and small pattern and don't do it because the matcher function . find() is slower than a normal if statement of a switch statement. Save this answer.


1 Answers

The pattern contains nested quantifiers. The \w+ is inside a group that is itself quantified with +, which makes it hard for the regex engine to process non-matching strings. It makes more sense to make a character class out of the alternation group, i.e. (\\w+|_|\\.|-)+ => [\\w.-]+.

Note that \w already matches _. Also, a | inside a character class matches a literal | char, and [a|b] matches a, | or b, so it seems you should remove the | from your first character class.

Use

.compile("[a-zA-Z_]?[\\w.-]+(?:@\\d{4}-\\d{2}-\\d{2})?\\.yang")

Note that you may use a non-capturing group ((?:...)) instead of a capturing one to avoid more overhead you do not need as you are just checking for a match and not extracting substrings.

See the regex demo (as the pattern is used with matches() and thus requires a full string match, I added ^ and $ in the regex demo).

like image 81
Wiktor Stribiżew Avatar answered Sep 21 '22 04:09

Wiktor Stribiżew