Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pattern: how subtract matched character in character class?

Tags:

java

regex

Is it possible to subtract a matched character in a character class?

Java docs are having examples about character classes with subtraction:

[a-z&&[^bc]]    - a through z, except for b and c: [ad-z] (subtraction)
[a-z&&[^m-p]]   - a through z, and not m through p: [a-lq-z](subtraction)

I want to write pattern, which matches two pairs of word characters, when pairs are not the same:

1) "aaaa123" - should NOT match
2) "aabb123" - should match "aabb" part
3) "aa--123" - should NOT match

I am close to success with following pattern:

([\w])\1([\w])\2

but of course it does not work in case 1, so I need to subtract the match of first group. But when I try to do this:

Pattern p = Pattern.compile("([\\w])\\1([\\w&&[^\\1]])\\2");

I am getting an exception:

Exception in thread "main" java.util.regex.PatternSyntaxException: Illegal/unsupported escape sequence near index 17
([\w])\1([\w&&[^\1]])\2
                 ^
    at java.util.regex.Pattern.error(Pattern.java:1713)

So seems it does not work with groups, but just with listing specific characters. Following pattern compiles with no problems:

Pattern p = Pattern.compile("([\\w])\\1([\\w&&[^a]])\\2");

Is there any other way to write such pattern?

like image 978
Laimoncijus Avatar asked Feb 07 '12 09:02

Laimoncijus


People also ask

Which pattern is used to match any word character?

The expression \w will match any word character. Word characters include alphanumeric characters ( - , - and - ) and underscores (_). \W matches any non-word character. Non-word characters include characters other than alphanumeric characters ( - , - and - ) and underscore (_).

Is minus a special character in regex?

- the minus sign indicates a range in a character class (when it is not at the first position after the "[" opening bracket or the last position before the "]" closing bracket. Example: "[A-Z]" matches any uppercase character. Example: "[A-Z-]" or "[-A-Z]" match any uppercase character or "-".

What is regex matching pattern?

A regex pattern matches a target string. The pattern is composed of a sequence of atoms. An atom is a single point within the regex pattern which it tries to match to the target string. The simplest atom is a literal, but grouping parts of the pattern to match an atom will require using ( ) as metacharacters.

How do I match a character in regex?

To match a character having special meaning in regex, you need to use a escape sequence prefix with a backslash ( \ ). E.g., \. matches "." ; regex \+ matches "+" ; and regex \( matches "(" . You also need to use regex \\ to match "\" (back-slash).


1 Answers

Use

Pattern p = Pattern.compile("((\\w)\\2(?!\\2))((\\w)\\4)");

Your characters will be in groups 1 and 3.

This works by using a negative lookahead, to make sure the character following the second character in the first character group is a different character.

like image 57
flesk Avatar answered Nov 03 '22 02:11

flesk