Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Split String in Java with [a-z] regular expression

Tags:

java

regex

I have two regexpressions:

[a-c] : any character from a-c

[a-z] : any character from a-z

And a test:

public static void main(String[] args) {
    String s = "abcde";
    String[] arr1 = s.split("[a-c]");
    String[] arr2 = s.split("[a-z]");

    System.out.println(arr1.length); //prints 4 : "", "", "", "de"
    System.out.println(arr2.length); //prints 0 
}

Why the second splitting behaves like this? I would expect a reslut with 6 empty string "" results.

like image 919
user711189 Avatar asked Jul 19 '13 19:07

user711189


2 Answers

According to the documentation of the single-argument String.split:

This method works as if by invoking the two-argument split method with the given expression and a limit argument of zero. Trailing empty strings are therefore not included in the resulting array.

To keep the trailing strings, you can use the two-argument version, and specify a negative limit:

    String s = "abcde";
    String[] arr1 = s.split("[a-c]", -1); // ["", "", "", "de"]
    String[] arr2 = s.split("[a-z]", -1); // ["", "", "", "", "", ""]
like image 61
ruakh Avatar answered Sep 23 '22 03:09

ruakh


By default, split discards trailing empty strings. In the arr2 case, they were all trailing empty strings, so they were all discarded.

To get 6 empty strings, pass a negative limit as the second parameter to the split method, which will keep all trailing empty strings.

String[] arr2 = s.split("[a-z]", -1);

If n is non-positive then the pattern will be applied as many times as possible and the array can have any length.

like image 21
rgettman Avatar answered Sep 22 '22 03:09

rgettman