I'm using this regex:
x.split("[^a-zA-Z0-9']+");
This returns an array of strings with letters and/or numbers.
If I use this:
String name = "CEN01_Automated_TestCase.java";
String[] names = name.Split.split("[^a-zA-Z0-9']+");
I got:
CEN01
Automated
TestCase
Java
But if I use this:
String name = "CEN01_Automação_Caso_Teste.java";
String[] names = name.Split.split("[^a-zA-Z0-9']+");
I got:
CEN01
Automa
o
Caso
Teste
Java
How can I modify this regex to include accented characters? (á,ã,õ, etc...)
To split a string by special characters, call the split() method on the string, passing it a regular expression that matches any of the special characters as a parameter. The method will split the string on each occurrence of a special character and return an array containing the results.
The split() method splits a string into an array of substrings. The split() method returns the new array. The split() method does not change the original string. If (" ") is used as separator, the string is split between words.
If we want to split a String at one of these characters, special care has to be taken to escape these characters in the method parameters. One way we can use this is to use a backslash \ . For example: string.
From http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html
Categories that behave like the
java.lang.Character boolean ismethodname
methods (except for the deprecated ones) are available through the same\p{prop}
syntax where the specified property has the namejavamethodname
.
Since Character
class contains isAlphabetic
method you can use
name.split("[^\\p{IsAlphabetic}0-9']+");
You can also use
name.split("(?U)[^\\p{Alpha}0-9']+");
but you will need to use UNICODE_CHARACTER_CLASS
flag which can be used by adding (?U)
in regex.
I would check out the Java Documentation on Regular Expressions. There is a unicode section which I believe is what you may be looking for.
EDIT: Example
Another way would be to match on the character code you are looking for. For example
\uFFFF where FFFF is the hexadecimal number of the character you are trying to match.
Example: \u00E0 matches à
Realize that the backslash will need to be escaped in Java if you are using it as a string literal.
Read more about it here.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With