Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Best way to modify an existing string? StringBuilder or convert to char array and back to string?

Tags:

java

string

I'm learning Java and am wondering what's the best way to modify strings here (both for performance and to learn the preferred method in Java). Assume you're looping through a string and checking each character/performing some action on that index in the string.

Do I use the StringBuilder class, or convert the string into a char array, make my modifications, and then convert the char array back to a string?

Example for StringBuilder:

StringBuilder newString = new StringBuilder(oldString);
for (int i = 0; i < oldString.length() ; i++) {
    newString.setCharAt(i, 'X');    
}

Example for char array conversion:

char[] newStringArray = oldString.toCharArray();
for (int i = 0; i < oldString.length() ; i++) {
    myNameChars[i] = 'X';    
}    
myString = String.valueOf(newStringArray);

What are the pros/cons to each different way?

I take it that StringBuilder is going to be more efficient since the converting to a char array makes copies of the array each time you update an index.

like image 728
user797963 Avatar asked Oct 09 '13 18:10

user797963


3 Answers

I say do whatever is most readable/maintainable until you you know that String "modification" is slowing you down. To me, this is the most readable:

Sting s = "foo";
s += "bar";
s += "baz";

If that's too slow, I'd use a StringBuilder. You may want to compare this to StringBuffer. If performance matters and synchronization does not, StringBuilder should be faster. If sychronization is needed, then you should use StringBuffer.

Also it's important to know that these strings are not being modified. In java, Strings are immutable.


This is all context specific. If you optimize this code and it doesn't make a noticeable difference (and this is usually the case), then you just thought longer than you had to and you probably made your code more difficult to understand. Optimize when you need to, not because you can. And before you do that, make sure the code you're optimizing is the cause of your performance issue.

like image 121
Daniel Kaplan Avatar answered Nov 13 '22 11:11

Daniel Kaplan


What are the pros/cons to each different way. I take it that StringBuilder is going to be more efficient since the convering to a char array makes copies of the array each time you update an index.

As written, the code in your second example will create just two arrays: one when you call toCharArray(), and another when you call String.valueOf() (String stores data in a char[] array). The element manipulations you are performing should not trigger any object allocations. There are no copies being made of the array when you read or write an element.

If you are going to be doing any sort of String manipulation, the recommended practice is to use a StringBuilder. If you are writing very performance-sensitive code, and your transformation does not alter the length of the string, then it might be worthwhile to manipulate the array directly. But since you are learning Java as a new language, I am going to guess that you are not working in high frequency trading or any other environment where latency is critical. Therefore, you are probably better off using a StringBuilder.

If you are performing any transformations that might yield a string of a different length than the original, you should almost certainly use a StringBuilder; it will resize its internal buffer as necessary.

On a related note, if you are doing simple string concatenation (e.g, s = "a" + someObject + "c"), the compiler will actually transform those operations into a chain of StringBuilder.append() calls, so you are free to use whichever you find more aesthetically pleasing. I personally prefer the + operator. However, if you are building up a string across multiple statements, you should create a single StringBuilder.

For example:

public String toString() {
    return "{field1 =" + this.field1 + 
           ",  field2 =" + this.field2 + 
           ...
           ",  field50 =" + this.field50 + "}";
}

Here, we have a single, long expression involving many concatenations. You don't need to worry about hand-optimizing this, because the compiler will use a single StringBuilder and just call append() on it repeatedly.

String s = ...;
if (someCondition) {
    s += someValue;
}
s += additionalValue;
return s;

Here, you'll end up with two StringBuilders being created under the covers, but unless this is an extremely hot code path in a latency-critical application, it's really not worth fretting about. Given similar code, but with many more separate concatenations, it might be worth optimizing. Same goes if you know the strings might be very large. But don't just guess--measure! Demonstrate that there's a performance problem before you try to fix it. (Note: this is just a general rule for "micro optimizations"; there's rarely a downside to explicitly using a StringBuilder. But don't assume it will make a measurable difference: if you're concerned about it, you should actually measure.)

String s = "";
for (final Object item : items) {
    s += item + "\n";
}

Here, we're performing a separate concatenation operation on each loop iteration, which means a new StringBuilder will be allocated on each pass. In this case, it's probably worth using a single StringBuilder since you may not know how large the collection will be. I would consider this an exception to the "prove there's a performance problem before optimizing rule": if the operation has the potential to explode in complexity based on input, err on the side of caution.

like image 26
Mike Strobel Avatar answered Nov 13 '22 11:11

Mike Strobel


Which option will perform the best is not an easy question.

I did a benchmark using Caliper:

                RUNTIME (NS)
array           88
builder         126
builderTillEnd  76
concat          3435

Benchmarked methods:

public static String array(String input)
{
    char[] result = input.toCharArray(); // COPYING
    for (int i = 0; i < input.length(); i++)
    {
        result[i] = 'X';
    }
    return String.valueOf(result); // COPYING
}

public static String builder(String input)
{
    StringBuilder result = new StringBuilder(input); // COPYING
    for (int i = 0; i < input.length(); i++)
    {
        result.setCharAt(i, 'X');
    }
    return result.toString(); // COPYING
}

public static StringBuilder builderTillEnd(String input)
{
    StringBuilder result = new StringBuilder(input); // COPYING
    for (int i = 0; i < input.length(); i++)
    {
        result.setCharAt(i, 'X');
    }
    return result;
}

public static String concat(String input)
{
    String result = "";
    for (int i = 0; i < input.length(); i++) 
    {
        result += 'X'; // terrible COPYING, COPYING, COPYING... same as:
                       // result = new StringBuilder(result).append('X').toString();
    }
    return result;
}

Remarks

  1. If we want to modify a String, we have to do at least 1 copy of that input String, because Strings in Java are immutable.

  2. java.lang.StringBuilder extends java.lang.AbstractStringBuilder. StringBuilder.setCharAt() is inherited from AbstractStringBuilder and looks like this:

    public void setCharAt(int index, char ch) {
        if ((index < 0) || (index >= count))
            throw new StringIndexOutOfBoundsException(index);
        value[index] = ch;
    }
    

    AbstractStringBuilder internally uses the simplest char array: char value[]. So, result[i] = 'X' is very similar to result.setCharAt(i, 'X'), however the second will call a polymorphic method (which probably gets inlined by JVM) and check bounds in if, so it will be a bit slower.

Conclusions

  1. If you can operate on StringBuilder until the end (you don't need String back) - do it. It's the preferred way and also the fastest. Simply the best.

  2. If you want String in the end and this is the bottleneck of your program, then you might consider using char array. In benchmark char array was ~25% faster than StringBuilder. Be sure to properly measure execution time of your program before and after optimization, because there is no guarantee about this 25%.

  3. Never concatenate Strings in the loop with + or +=, unless you really know what you do. Usally it's better to use explicit StringBuilder and append().

like image 1
Adam Stelmaszczyk Avatar answered Nov 13 '22 10:11

Adam Stelmaszczyk