Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Slow string concatenation over large input

I've written an n-ary tree ADT which works fine. However, I need to store its serialization in a variable a calling class. eg.

    DomTree<String> a = Data.createTreeInstance("very_large_file.xml");
    String x = a.toString();

I've written method which serves the purpose exactly how I need it, but on very large inputs it takes forever (20mins on a 100MB xml file) - I have timed the methods and building the tree from the xml file is quick, but calling toString() as shown above is very slow.

@Override
public String toString(){
    return printTree(this);
}

public String printTree(AbstractTree<E> tree){
    if (tree.isLeaf()){
        return tree.getNodeName();
    }else{
        String tStr = tree.getNodeName() + "(";

        int i = 0;
        Iterator<AbstractTree<E>> child = tree.getChildren().iterator();
        while (i < tree.getChildren().size() - 1){

            tStr += printTree(child.next()) + ", ";
            i++;
        }
        tStr += printTree(child.next()) + ")";

        return tStr;    
    }
}

I'm guessing it is to do with the way the string is built up rather than how the tree is traversed? Is there a better way to do this?

UPDATE: Following the example of Skaffman, the following code give outOfMemoryError for very large input.

@Override
public String toString(){
    StringBuilder buffer = new StringBuilder();
    printTree(this, buffer);
    return buffer.toString();

}

public String printTree(AbstractTree<E> tree, StringBuilder buffer){
    if (tree.isLeaf()){
        return tree.getNodeName();
    }else{
        buffer.append(tree.getNodeName());
        buffer.append("(");

        int i = 0;
        Iterator<AbstractTree<E>> child = tree.getChildren().iterator();
        while (i < tree.getChildren().size() - 1){

            buffer.append(printTree(child.next(), buffer));
            buffer.append(", ");
            i++;
        }
        buffer.append(printTree(child.next(), buffer)); 
        buffer.append(")");

        return buffer.toString();   
    }
}

UPDATE: Works perfectly now, using Skaffmans example

like image 983
Robert Avatar asked Jul 14 '09 16:07

Robert


People also ask

Does concatenation slow down the speed of a string?

But the idea is that with each string concat being on it’s own line, in theory it should have to create a new string each time. And the results : So, a little bit of a slow down which is expected, but maybe not as much as I was expecting.

What is the use of concatenation in Java?

Concat (String str) method concatenates the specified String to the end of this string. This method appends the specified string at the end of the given string and returns the combined string.

How to concatenate strings together?

In C#, there is a grand total of 6 ways to concatenate a string. Those are : String.Join Using String Interpolation (e.x. $”My string {variable}”). I recently got asked about performance considerations when joining two strings together. I think everyone knows by now that using the + to join up large strings is (supposedly) a no no.

What is the difference between StringBuilder and concatenate in Java?

Lets us describe and implement them one by one. Concat (String str) method concatenates the specified String to the end of this string. This method appends the specified string at the end of the given string and returns the combined string. StringBuilder represents a mutable sequence of characters.


3 Answers

String concats like that are punishingly slow. Use a StringBuilder.

@Override
public String toString(){
        StringBuilder buffer = new StringBuilder();
        printTree(this, buffer);
        return buffer.toString();
}

public void printTree(AbstractTree<E> tree, StringBuilder buffer){
    if (tree.isLeaf()){
        buffer.append(tree.getNodeName());
    } else {
        buffer.append(tree.getNodeName());
        buffer.append("(");

        int i = 0;
        Iterator<AbstractTree<E>> child = tree.getChildren().iterator();
        while (i < tree.getChildren().size() - 1){
            printTree(child.next(), buffer);
            buffer.append(", ");
            i++;
        }
        printTree(child.next(), buffer); 
        buffer.append(")");
    }
}
like image 82
skaffman Avatar answered Oct 24 '22 07:10

skaffman


Don't use string concatenation in loops. It does not scale.

Use StringBuilder, this does not make new objects all the time, like string concatenation..

void print() {
StringBuilder sb = new StringBuilder();
sb.append("hello");
sb.append(" World!");
System.out.println(sb.toString());

}

like image 28
raoulsson Avatar answered Oct 24 '22 08:10

raoulsson


Let me say the reason that string concatenation is slow is because strings are immutable. This means every time you write "+=", a new String is created. This means the way you build up your string is in the worst case, O(n2). That's because if you +='ed 1 char at a time, the cost of building a new string would be 2 + 3 + 4 + ... + n, which is O(n2).

Use StringBuilder as other's suggest (over the slower, but threadsafe StringBuffer).

I suppose I should add, StringBuilder will give you O(n) amortized time, because it works like a vector behind the scenes, since it is mutable. So build up your string there, and then call toString().

StringBuilder builder = new StringBuilder();
builder.append("blah"); // append more as needed.
String text = builder.toString();

I would also like to add that this problem is similar in Python. The idiom in python is to append all your strings to concatenate into a list, and then join the list. "".join(the_list).

UPDATE: As Bill points out, concatenation is not the root of all evil. One off string concatenations are fine, and may even be optimized! (They are also worst case linear). But, when you are concatenating in a loop, as you are above, the performance will drastically change as the number of iterations goes up. In that case, my above analysis is flawless, as I specifically stated it is "worst case", which means you assume no optimizations. (Which the JVM can't even optimize the concatenation in loops as well as it can outside).

like image 42
Tom Avatar answered Oct 24 '22 08:10

Tom