I am trying to match some unicode charaters sequence:
Pattern pattern = Pattern.compile("\\u05[dDeE][0-9a-fA-F]{2,}");
    String text = "\\n     \\u05db\\u05d3\\u05d5\\u05e8\\u05d2\\u05dc\\n    <\\/span>\\n<br style=\\";
    Matcher match = pattern.matcher(text);
but doing so gives this exception:
Exception in thread "main" java.util.regex.PatternSyntaxException: Illegal Unicode escape sequence near index 4
  \u05[dDeE][0-9a-fA-F]+
      ^
how can I use still use regex with some regex chars (like "[") to match unicode?
EDIT: I'm trying to parse some text. The text somewhere has a sequence of Unicode characters, which I know their code range.
Edit2:
I am now using ranges instead : [\\u05d0-\\u05ea]{2,} but still can't match the text above
Edit3: ok, now it's working, the problem was I used two backslashes instead of one, both in the regex and text. The solution for this is, assuming I know there will be two chars or more:
[\u05d0-\u05ea]{2,}
                RegexBuddy's regex engine is fully Unicode-based starting with version 2.0. 0.
The backslash \ is an escape character in Java Strings. That means backslash has a predefined meaning in Java. You have to use double backslash \\ to define a single backslash. If you want to define \w , then you must be using \\w in your regex.
\p{L} matches a single code point in the category "letter". \p{N} matches any kind of numeric character in any script. Source: regular-expressions.info.
\u000d — Carriage return — \r. \u2028 — Line separator. \u2029 — Paragraph separator.
Here is what causing the exception:
\\u05[dDeE][0-9a-fA-F]}{2,}
  ^^^^
The java regular expression parser thinks you are trying to match a Unicode code point using the escape sequence \uNNNN so it is giving an exception, because \u requires four hexadecimal digits after it and there is only two of them, namely 05 so you need to change it to \\u0005 if that is what you actually want.
On the other hand, if you want to match \\u in the target string, then you need to quad escape each backslash \ like this \\\\ so to match \\u you need \\\\\\\\u.
\\\\\\\\u05[dDeE][0-9a-fA-F]}{2,}
Finally, if you want to match those Unicode code points literally in your target string then you need to modify our last expression a bit like this:
(?:\\\\\\\\u05[dDeE][0-9a-fA-F]){2,}
Edit: Since there is only one backslash in your target string then your regular expression should be:
(?:\\\\u05[dDeE][0-9a-fA-F]){2,}
This will match \u05db\u05d3\u05d5\u05e8\u05d2\u05dc in your string
<\/span><\/span><span dir=\"rtl\">\n \u05db\u05d3\u05d5\u05e8\u05d2\u05dc\n <\/span>\n<br style=\"clear : both; font-size : 1px;\">\n<\/div>"}, 200, null, null);
Edit 2: If you want to match literal \u05db\u05d3\u05d5\u05e8\u05d2\u05dc then you can't use a range.
On the other hand, if you want to match Unicode code points between 05d0 and 05df then you can use:
(?:[\\u05d0\\u05df]){2,}
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With