Need a regex for Java generic type parameters, so i've tried with :
^[A-Z](([A-Z_0-9])*[^_])?$
means the type name should have 1 or more characters all uppercase and digits, it's possible
to use '_' as separator, but not at the end, f.e. 'TT_A9'
But to my surprise my regex tool shows a match for 'Aa' or 'AAa' or 'AA-'
I wrote a simple test class to check :
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class RegexTestPatternMatcher {
public static final String test = "AA-";
public static void main(String[] args) {
Pattern pattern = Pattern.compile("^[A-Z](([A-Z_0-9])*[^_])?$");
Matcher matcher = pattern.matcher(test);
System.out.println("Matches ? " + matcher.matches());
}
}
Output :
AA- Matches ? true
It's also true for AAa, but not for AA_
It works if i use the regex ^[A-Z](([A-Z_0-9])*[^_a-z-])?$
but i don't understand why i need to use 'a-z' and '-' as exclusion,
when i'm only looking for uppercase characters !?
When using a negated character class - as in your original pattern, [^_] - you tell the regex to consume a character other than the one defined in the class. So, your regex actually requires at least 2 chars, the first one being an uppercase ASCII letter, and any char but _ at the end, and there can be any characters in the _, 0-9 and A-Z ranges in between.
You are looking for a negative lookbehind anchored at the end of the string:
^[A-Z][A-Z_0-9]*$(?<!_)
^^^^^^
See the regex demo
It will fail all matches where the _ is at the end of the string. The _ is not consumed, it is only checked for presence, and thus the pattern will accept (match) a 1-char string starting with an uppercase ASCII letter and optionally followed with the characters from the ranges defined in the [A-Z_0-9] character class.
I also suggest removing all redundant groupings (you are not using the captured subtexts anyway).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With