Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Punctuation Regex in Java

Tags:

java

regex

First, i'm read the documentation as follow

http://download.oracle.com/javase/1.4.2/docs/api/java/util/regex/Pattern.html

And i want find any punctuation character EXCEPT @',& but i don't quite understand.

Here is :

public static void main( String[] args )
{       
     // String to be scanned to find the pattern.
     String value = "#`~!#$%^";
     String pattern = "\\p{Punct}[^@',&]";

    // Create a Pattern object
    Pattern r = Pattern.compile(pattern, Pattern.CASE_INSENSITIVE);

    // Now create matcher object.
    Matcher m = r.matcher(value);
    if (m.find()) {
       System.out.println("Found value: " + m.groupCount());
    } else {
       System.out.println("NO MATCH");
    }


}

Result is NO MATCH.
Is there any mismatch ?

Thanks
MRizq

like image 303
MRizq Avatar asked Nov 20 '11 10:11

MRizq


People also ask

How do you match punctuation in regex?

Matching specific punctuation Some punctuation has special meaning in RegEx. It can get confusing if you are searching for things question marks, periods, and parentheses. For example, a period means “match any character.” The easiest way to get around this is to “escape” the character.

What does '$' mean in regex?

$ means "Match the end of the string" (the position after the last character in the string). Both are called anchors and ensure that the entire string is matched instead of just a substring.

What does \\ s+ mean in Java?

The plus sign + is a greedy quantifier, which means one or more times. For example, expression X+ matches one or more X characters. Therefore, the regular expression \s matches a single whitespace character, while \s+ will match one or more whitespace characters.


2 Answers

You may use character subtraction here:

String pat = "[\\p{Punct}&&[^@',&]]";

The whole pattern represents a character class, [...], that contains a \p{Punct} POSIX character class, the && intersection operator and [^...] negated character class.

A Unicode modifier might be necessary if you plan to also match all Unicode punctuation:

String pat = "(?U)[\\p{Punct}&&[^@',&]]";
              ^^^^

The pattern matches any punctuation (with \p{Punct}) except @, ', , and &.

If you need to exclude more characters, add them to the negated character class. Just remember to always escape -, \, ^, [ and ] inside a Java regex character class/set. E.g. adding a backslash and - might look like "[\\p{Punct}&&[^@',&\\\\-]]" or "[\\p{Punct}&&[^@',&\\-\\\\]]".

Java demo:

String value = "#`~!#$%^,";
String pattern = "(?U)[\\p{Punct}&&[^@',&]]";
Pattern r = Pattern.compile(pattern);    // Create a Pattern object
Matcher m = r.matcher(value);            // Now create matcher object.
while (m.find()) {
    System.out.println("Found value: " + m.group());
}

Output:

Found value: #
Found value: !
Found value: #
Found value: %
Found value: ,
like image 144
Wiktor Stribiżew Avatar answered Sep 24 '22 03:09

Wiktor Stribiżew


You're matching two characters, not one. Using a (negative) lookahead should solve the task:

(?![@',&])\\p{Punct}
like image 42
Lucero Avatar answered Sep 22 '22 03:09

Lucero