I have a String like this
أصبح::ينال::أخذ::حصل (على)::أحضر
And I want to split it on non Arabic characters using java
And here's my code
String s = "أصبح::ينال::أخذ::حصل (على)::أحضر";
String[] arr = s.split("^\\p{InArabic}+");
System.out.println(Arrays.toString(arr));
And the output was
[, ::ينال::أخذ::حصل (على)::أحضر]
But I expect the output to be
[ينال,أخذ,حصل,على,أحضر]
So I don't know what's wrong with this?
You need a negated class, and to do that, you need square brackets [ ... ]
. Try to split with this:
"[^\\p{InArabic}]+"
If \\p{InArabic}
matches any arabic character, then [^\\p{InArabic}]
will match any non-arabic character.
Another option you can consider is an equivalent syntax, using P
instead of p
to indicate the opposite of the \\p{InArabic}
character class like @Pshemo mentioned:
"\\P{InArabic}+"
This works just like \\W
is the opposite of \\w
.
The only possible advantage you get with the first syntax over the second (again like @Pshemo mentioned), is that if you want to add other characters to the list of characters which shouldn't match, for example, if you want to match all non \\p{InArabic}
except periods, the first one is more flexible:
"[^\\p{InArabic}.]+"
^
Otherwise, if you really want to use \\P{InArabic}
, you'll need subtraction within classes:
"[\\P{InArabic}&&[^.]]+"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With