For the sake of this question, let's assume I have a String
which contains the values Two;.Three;.Four
(and so on) but the elements are separated by ;.
.
Now I know there are multiple ways of splitting a string such as split()
and StringTokenizer
(being the faster one and works well) but my input file is around 1GB and I am looking for something slightly more efficient than StringTokenizer
.
After some research, I found that indexOf
and substring
are quite efficient but the examples only have single delimiters or results are returning only a single word/element.
Sample code using indexOf
and substring
:
String s = "quick,brown,fox,jumps,over,the,lazy,dog";
int from = s.indexOf(',');
int to = s.indexOf(',', from+1);
String brown = s.substring(from+1, to);
The above works for printing brown
but how can I use indexOf
and substring
to split a line with multiple delimiters and display all the items as below.
Expected output
Two
Three
Four
....and so on
Split() String method in Java with examples The string split() method breaks a given string around matches of the given regular expression. After splitting against the given regular expression, this method returns a string array.
String. split(String) won't create regexp if your pattern is only one character long. When splitting by single character, it will use specialized code which is pretty efficient.
Answer: Generally, StringTokenizer is faster in terms of performance, but String. split is more reliable.
Spliting a String in Java The most common way is using the split() method which is used to split a string into an array of sub-strings and returns the new array.
This is the method I use for splitting large (1GB+) tab-separated files. It is limited to a char
delimiter to avoid any overhead of additional method invocations (which may be optimized out by the runtime), but it can be easily converted to String-delimited. I'd be interested if anyone can come up with a faster method or improvements on this method.
public static String[] split(final String line, final char delimiter)
{
CharSequence[] temp = new CharSequence[(line.length() / 2) + 1];
int wordCount = 0;
int i = 0;
int j = line.indexOf(delimiter, 0); // first substring
while (j >= 0)
{
temp[wordCount++] = line.substring(i, j);
i = j + 1;
j = line.indexOf(delimiter, i); // rest of substrings
}
temp[wordCount++] = line.substring(i); // last substring
String[] result = new String[wordCount];
System.arraycopy(temp, 0, result, 0, wordCount);
return result;
}
StringTokenizer
is faster than StringBuilder
.
public static void main(String[] args) {
String str = "This is String , split by StringTokenizer, created by me";
StringTokenizer st = new StringTokenizer(str);
System.out.println("---- Split by space ------");
while (st.hasMoreElements()) {
System.out.println(st.nextElement());
}
System.out.println("---- Split by comma ',' ------");
StringTokenizer st2 = new StringTokenizer(str, ",");
while (st2.hasMoreElements()) {
System.out.println(st2.nextElement());
}
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With