I'm trying to match the string iso_schematron_skeleton_for_xslt1.xsl
against the regexp ([a-zA-Z|_])?(\w+|_|\.|-)+(@\d{4}-\d{2}-\d{2})?\.yang
.
The expected result is false
, it should not match.
The problem is that the call to matcher.matches()
never returns.
Is this a bug in the Java regexp implementation?
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class HelloWorld{
private static final Pattern YANG_MODULE_RE = Pattern
.compile("([a-zA-Z|_])?(\\w+|_|\\.|-)+(@\\d{4}-\\d{2}-\\d{2})?\\.yang");
public static void main(String []args){
final Matcher matcher = YANG_MODULE_RE.matcher("iso_schematron_skeleton_for_xslt1.xsl");
System.out.println(Boolean.toString( matcher.matches()));
}
}
I'm using:
openjdk version "1.8.0_181"
OpenJDK Runtime Environment (build 1.8.0_181-b15)
OpenJDK 64-Bit Server VM (build 25.181-b15, mixed mode)
The Regex class itself is thread safe and immutable (read-only). That is, Regex objects can be created on any thread and shared between threads; matching methods can be called from any thread and never alter any global state.
Variant 1: String matches() This method tells whether or not this string matches the given regular expression. An invocation of this method of the form str. matches(regex) yields exactly the same result as the expression Pattern. matches(regex, str).
Regular expressions can be used to perform all types of text search and text replace operations. Java does not have a built-in Regular Expression class, but we can import the java.util.regex package to work with regular expressions.
Regex is faster for large string than an if (perhaps in a for loops) to check if anything matches your requirement. If you are using regex as to match very small text and small pattern and don't do it because the matcher function . find() is slower than a normal if statement of a switch statement. Save this answer.
The pattern contains nested quantifiers. The \w+
is inside a group that is itself quantified with +
, which makes it hard for the regex engine to process non-matching strings. It makes more sense to make a character class out of the alternation group, i.e. (\\w+|_|\\.|-)+
=> [\\w.-]+
.
Note that \w
already matches _
. Also, a |
inside a character class matches a literal |
char, and [a|b]
matches a
, |
or b
, so it seems you should remove the |
from your first character class.
Use
.compile("[a-zA-Z_]?[\\w.-]+(?:@\\d{4}-\\d{2}-\\d{2})?\\.yang")
Note that you may use a non-capturing group ((?:...)
) instead of a capturing one to avoid more overhead you do not need as you are just checking for a match and not extracting substrings.
See the regex demo (as the pattern is used with matches()
and thus requires a full string match, I added ^
and $
in the regex demo).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With