Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Memory issues with String.split()

My programs currently has memory problems, and upon checking the app, we've discovered that the String.split() method uses lots of memory. I've tried using a StreamTokenizer, but it seems this makes things even more complex.

Is there a better way to split long Strings into small Strings that uses less memory than the String.split() method?

like image 530
Myth Pro Avatar asked Aug 09 '12 12:08

Myth Pro


People also ask

Is string split efficient?

String. split(String) won't create regexp if your pattern is only one character long. When splitting by single character, it will use specialized code which is pretty efficient.

What happens when you split a string?

The split() method splits a string into a list. You can specify the separator, default separator is any whitespace. Note: When maxsplit is specified, the list will contain the specified number of elements plus one.

Does split () alter the original string?

The split() method splits a string into an array of substrings. The split() method returns the new array. The split() method does not change the original string.

How much memory does string occupy?

An empty String takes 40 bytes—enough memory to fit 20 Java characters.


1 Answers

It is highly unlikely that any realistic use of split would "consume lots of memory". Your input would have to be huge (many, many megabytes) and your result split into many millions of parts for it to even be noticed.

Here's some code that creates a random string of approximately 1.8 million characters and splits it into over 1 million Strings and outputs the memory used and time taken.

As you can see, it ain't much: 61Mb consumed in just 350ms.

public static void main(String[] args) throws Exception {
    StringBuilder sb = new StringBuilder();
    for (int i = 0; i < 99999; i++) {
        sb.append(Math.random());
    }
    long begin = System.currentTimeMillis();
    String string = sb.toString();
    sb = null;
    System.gc();
    long startFreeMem = Runtime.getRuntime().freeMemory();
    String[] strings = string.split("(?=[0-5])");
    long endFreeMem = Runtime.getRuntime().freeMemory();
    long execution = System.currentTimeMillis() - begin;

    System.out.println("input length = " + string.length() + "\nnumber of strings after split = " + strings.length + "\nmemory consumed due to split = "
            + (startFreeMem - endFreeMem) + "\nexecution time = " + execution + "ms");
}

Output (run on fairly typical windows box):

input length = 1827035
number of strings after split = 1072788
memory consumed due to split = 71740240
execution time = 351ms

Interestingly, without System.gc() the memory used was about 1/3:

memory consumed due to split = 29582328
like image 166
Bohemian Avatar answered Oct 12 '22 23:10

Bohemian