Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java String Split On Non-Alphabetic Characters

I want to split a String into a String array along non-alphabetic characters. For example:

"Here is an ex@mple" => "Here", "is", "an" "ex", "mple"

I tried using the String.split(String regex) method with the regular expression "(?![\\p{Alpha}])". However this splits the string into

"Here", "_is", "_an", "_ex", "@ample"

(those underscores are to emphasize there is a space). I guess this is because the ?! regex operator is "zero-width" and is actually splitting on and removing a zero-width character preceding the non-alphabetic characters in the input string.

How can I accomplish removal of the actual non-alpha characters while I split the string? Is there a NON-zero-width negation operator?

like image 286
dmoench Avatar asked Dec 05 '12 00:12

dmoench


1 Answers

You could try \P{Alpha}+:

"Here is an ex@mple".split("\\P{Alpha}+")
["Here", "is", "an", "ex", "mple"]

\P{Alpha} matches any non-alphabetic character (as opposed to \p{Alpha}, which matches any alphabetic character). + indicates that we should split on any continuous string of such characters. For example:

"a!@#$%^&*b".split("\\P{Alpha}+")
["a", "b"]
like image 76
arshajii Avatar answered Sep 30 '22 12:09

arshajii