Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Comparison of String and StringBuilder manipulation in terms of memory usage

According to SCJP Study Guide by KathySierra:

The java.lang.StringBuffer and java.lang.StringBuilder classes should be used when you have to make modifications to strings of characters. As we discussed, String objects are immutable, so if you choose to do a lot of manipulations with String objects, you will end up with a lot of abandoned String objects in the String pool

To clear out this, I have gone through the code of String class and StringBuilder source here.

The simplfied code of String looks like this:

public final class String(){
     private final char [] value; //Final helps in immutability, I guess.
     public String (String original){
         value = original.value;
      }
}

And StringBuilder's simplified version look like this:

public final class StringBuilder{
    char [] value;
    public StringBuilder(String str) {
        value = new Char[(str.length() + 16)]; // 16 here is implementation dependent.
    append(str);
}

public StringBuilder append(String str){

            //Add 'str' characters in value array if its size allows,
        else
            // Create new array of size newCapacity and copy contents of 'value' in that.
            //value = Arrays.copyOf(value, newCapacity);// here old array object is lost.

        return this;
    }
}

So let's say we have a case as under:

Using String class:

String s1 = "abc"; // Creates one object on String pool.
s1 = s1+"def"; // Creates two objects - "def " on String pool
// and "abcdef" on the heap.

If I use StringBuilder, the code becomes:

 StringBuilder s1 = StringBuilder("abc");

 // Creates one String object "abc " on String pool.
 // And one StringBuilder object "abc" on the heap.
 s1.append("def");
 // Creates one string object "def" on String pool.
 // And changes the char [] inside StringBuilder to hold "def" also.

In StringBuilder s2 = s1.append("def"); there are equal chances that the char array holding the string will be replaced by a new char array. The old array is reference less now and will be garbage collected.

My Query is:

Using simple String concatenation and StringBuilder append() method, the number of String objects that go on to the String pool is same.

And according to the code listed above, StringBuilder does make use of bigger char arrays in the first place while the String object contains a char array of the same length as the string it is holding.

  1. How is the usage of StringBuilder more efficient than normal String class for string manipulations?
  2. And is the statement given in SCJP Guide wrong?
like image 248
Aman Arora Avatar asked Dec 16 '22 03:12

Aman Arora


1 Answers

The key is the expandCapacity function:

void expandCapacity(int minimumCapacity) {
    int newCapacity = (value.length + 1) * 2; //important part here
    if (newCapacity < 0) {
        newCapacity = Integer.MAX_VALUE;
    } else if (minimumCapacity > newCapacity) {
        newCapacity = minimumCapacity;
    }
    value = Arrays.copyOf(value, newCapacity);
}

By resizing the underlying array by a factor of 2 every time a resize is needed, the amortized time needed to append 1 character is minimized.

Wikipedia has a good explanation:

As n elements are inserted, the capacities form a geometric progression. Expanding the array by any constant proportion ensures that inserting n elements takes O(n) time overall, meaning that each insertion takes amortized constant time. The value of this proportion a leads to a time-space tradeoff: the average time per insertion operation is about a/(a−1), while the number of wasted cells is bounded above by (a−1)n. The choice of a depends on the library or application: some textbooks use a = 2, but Java's ArrayList implementation uses a = 3/2 and the C implementation of Python's list data structure uses a = 9/8.

Many dynamic arrays also deallocate some of the underlying storage if its size drops below a certain threshold, such as 30% of the capacity. This threshold must be strictly smaller than 1/a in order to support mixed sequences of insertions and removals with amortized constant cost.

Dynamic arrays are a common example when teaching amortized analysis.

Now for your particular example it would not make a difference, but you will see the effects when appending lots of characters:

public static void main(String[] args){
    int numAppends = 200000;
    int numRepetitions = 3;
    long[] time1 = new long[numRepetitions];
    long[] time2 = new long[numRepetitions];
    long now;
    for (int k = 0; k < numRepetitions; k++){
        String s = "";
        now = System.nanoTime();
        for(int i = 0; i < numAppends ; i++) {
            s = s + "a";
        }
        time1[k] = (System.nanoTime() - now) / 1000000;
        StringBuilder sb = new StringBuilder();
        now = System.nanoTime();
        for(int i = 0; i < numAppends ; i++) {
            sb.append("a");     
        }
        time2[k] = (System.nanoTime() - now) / 1000000;
        System.out.println("Rep "+k+", time1: "+time1[k]+ " ms, time2: " + time2[k] + " ms");
    }
}

Output:

Rep 0, time1: 13509 ms, time2: 7 ms
Rep 1, time1: 13086 ms, time2: 1 ms
Rep 2, time1: 13167 ms, time2: 1 ms

Also, I counted the number of times the Arrays.copyOf() method is called inside extendCapacity() for the benchmark: It's 49 times on the first iteration, but only 15 times on the second and third iterations. The output is as follows:

newCapacity: 34
newCapacity: 70
newCapacity: 142
newCapacity: 286
newCapacity: 574
newCapacity: 1150
newCapacity: 2302
newCapacity: 4606
newCapacity: 9214
newCapacity: 18430
newCapacity: 36862
newCapacity: 73726
newCapacity: 147454
newCapacity: 294910
newCapacity: 42
Rep 2, time1: 12955 ms, time2: 134 ms
like image 177
jmiserez Avatar answered Dec 17 '22 15:12

jmiserez