Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Most efficient way of splitting String in Java

For the sake of this question, let's assume I have a String which contains the values Two;.Three;.Four (and so on) but the elements are separated by ;..

Now I know there are multiple ways of splitting a string such as split() and StringTokenizer (being the faster one and works well) but my input file is around 1GB and I am looking for something slightly more efficient than StringTokenizer.

After some research, I found that indexOf and substring are quite efficient but the examples only have single delimiters or results are returning only a single word/element.

Sample code using indexOf and substring:

String s = "quick,brown,fox,jumps,over,the,lazy,dog";
int from = s.indexOf(',');
int to = s.indexOf(',', from+1);
String brown = s.substring(from+1, to);

The above works for printing brown but how can I use indexOf and substring to split a line with multiple delimiters and display all the items as below.

Expected output

Two
Three
Four
....and so on
like image 311
user92038111111 Avatar asked Mar 25 '15 22:03

user92038111111


People also ask

What is the best way to split a string in Java?

Split() String method in Java with examples The string split() method breaks a given string around matches of the given regular expression. After splitting against the given regular expression, this method returns a string array.

Is string split efficient?

String. split(String) won't create regexp if your pattern is only one character long. When splitting by single character, it will use specialized code which is pretty efficient.

Which is faster split or substring?

Answer: Generally, StringTokenizer is faster in terms of performance, but String. split is more reliable.

Is there a way to split string in Java?

Spliting a String in Java The most common way is using the split() method which is used to split a string into an array of sub-strings and returns the new array.


2 Answers

This is the method I use for splitting large (1GB+) tab-separated files. It is limited to a char delimiter to avoid any overhead of additional method invocations (which may be optimized out by the runtime), but it can be easily converted to String-delimited. I'd be interested if anyone can come up with a faster method or improvements on this method.

public static String[] split(final String line, final char delimiter)
{
    CharSequence[] temp = new CharSequence[(line.length() / 2) + 1];
    int wordCount = 0;
    int i = 0;
    int j = line.indexOf(delimiter, 0); // first substring

    while (j >= 0)
    {
        temp[wordCount++] = line.substring(i, j);
        i = j + 1;
        j = line.indexOf(delimiter, i); // rest of substrings
    }

    temp[wordCount++] = line.substring(i); // last substring

    String[] result = new String[wordCount];
    System.arraycopy(temp, 0, result, 0, wordCount);

    return result;
}
like image 118
vallismortis Avatar answered Oct 16 '22 03:10

vallismortis


StringTokenizer is faster than StringBuilder.

public static void main(String[] args) {

    String str = "This is String , split by StringTokenizer, created by me";
    StringTokenizer st = new StringTokenizer(str);

    System.out.println("---- Split by space ------");
    while (st.hasMoreElements()) {
        System.out.println(st.nextElement());
    }

    System.out.println("---- Split by comma ',' ------");
    StringTokenizer st2 = new StringTokenizer(str, ",");

    while (st2.hasMoreElements()) {
        System.out.println(st2.nextElement());
    }
}
like image 26
user92038111111 Avatar answered Oct 16 '22 01:10

user92038111111