Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

String.split() - matching leading empty String prior to first delimiter?

Tags:

java

regex

I need to be able to split an input String by commas, semi-colons or white-space (or a mix of the three). I would also like to treat multiple consecutive delimiters in the input as a single delimiter. Here's what I have so far:

String regex = "[,;\\s]+";    
return input.split(regex);

This works, except for when the input string starts with one of the delimiter characters, in which case the first element of the result array is an empty String. I do not want my result to have empty Strings, so that something like, ",,,,ZERO; , ;;ONE ,TWO;," returns just a three element array containing the capitalized Strings.

Is there a better way to do this than stripping out any leading characters that match my reg-ex prior to invoking String.split?

Thanks in advance!

like image 703
AndreiM Avatar asked Apr 28 '10 19:04

AndreiM


4 Answers

No, there isn't. You can only ignore trailing delimiters by providing 0 as a second parameter to String's split() method:

return input.split(regex, 0);

but for leading delimiters, you'll have to strip them first:

return input.replaceFirst("^"+regex, "").split(regex, 0);
like image 62
Bart Kiers Avatar answered Sep 23 '22 14:09

Bart Kiers


If by "better" you mean higher performance then you might want to try creating a regular expression that matches what you want to match and using Matcher.find in a loop and pulling out the matches as you find them. This saves modifying the string first. But measure it for yourself to see which is faster for your data.

If by "better" you mean simpler, then no I don't think there is a simpler way than the way you suggested: removing the leading separators before applying the split.

like image 39
Mark Byers Avatar answered Sep 25 '22 14:09

Mark Byers


Pretty much all splitting facilities built into the JDK are broken one way or another. You'd be better off using a third-party class such as Splitter, which is both flexible and correct in how it handles empty tokens and whitespaces:

Splitter.on(CharMatcher.anyOf(";,").or(CharMatcher.WHITESPACE))
    .omitEmptyStrings()
    .split(",,,ZERO;,ONE TWO");

will yield an Iterable<String> containing "ZERO", "ONE", "TWO"

like image 33
Julien Silland Avatar answered Sep 24 '22 14:09

Julien Silland


You could also potentially use StringTokenizer to build the list, depending what you need to do with it:

StringTokenizer st = new StringTokenizer(",,,ZERO;,ONE TWO", ",; ", false);
while(st.hasMoreTokens()) {
  String str = st.nextToken();
  //add to list, process, etc...
}

As a caveat, however, you'll need to define each potential whitespace character separately in the second argument to the constructor.

like image 22
mtruesdell Avatar answered Sep 23 '22 14:09

mtruesdell