Is it possible to subtract a matched character in a character class?
Java docs are having examples about character classes with subtraction:
[a-z&&[^bc]] - a through z, except for b and c: [ad-z] (subtraction)
[a-z&&[^m-p]] - a through z, and not m through p: [a-lq-z](subtraction)
I want to write pattern, which matches two pairs of word characters, when pairs are not the same:
1) "aaaa123" - should NOT match
2) "aabb123" - should match "aabb" part
3) "aa--123" - should NOT match
I am close to success with following pattern:
([\w])\1([\w])\2
but of course it does not work in case 1, so I need to subtract the match of first group. But when I try to do this:
Pattern p = Pattern.compile("([\\w])\\1([\\w&&[^\\1]])\\2");
I am getting an exception:
Exception in thread "main" java.util.regex.PatternSyntaxException: Illegal/unsupported escape sequence near index 17
([\w])\1([\w&&[^\1]])\2
^
at java.util.regex.Pattern.error(Pattern.java:1713)
So seems it does not work with groups, but just with listing specific characters. Following pattern compiles with no problems:
Pattern p = Pattern.compile("([\\w])\\1([\\w&&[^a]])\\2");
Is there any other way to write such pattern?
The expression \w will match any word character. Word characters include alphanumeric characters ( - , - and - ) and underscores (_). \W matches any non-word character. Non-word characters include characters other than alphanumeric characters ( - , - and - ) and underscore (_).
- the minus sign indicates a range in a character class (when it is not at the first position after the "[" opening bracket or the last position before the "]" closing bracket. Example: "[A-Z]" matches any uppercase character. Example: "[A-Z-]" or "[-A-Z]" match any uppercase character or "-".
A regex pattern matches a target string. The pattern is composed of a sequence of atoms. An atom is a single point within the regex pattern which it tries to match to the target string. The simplest atom is a literal, but grouping parts of the pattern to match an atom will require using ( ) as metacharacters.
To match a character having special meaning in regex, you need to use a escape sequence prefix with a backslash ( \ ). E.g., \. matches "." ; regex \+ matches "+" ; and regex \( matches "(" . You also need to use regex \\ to match "\" (back-slash).
Use
Pattern p = Pattern.compile("((\\w)\\2(?!\\2))((\\w)\\4)");
Your characters will be in groups 1
and 3
.
This works by using a negative lookahead, to make sure the character following the second character in the first character group is a different character.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With