Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Efficient string concatenation in Scala

The JVM optimzes String concatenation with + and replaces it with a StringBuilder. This should be the same in Scala. But what happens if strings are concatenated with ++=?

var x = "x"
x ++= "y"
x ++= "z"

As far as I know this methods treats strings like char seqences, so even if the JVM would create a StringBuilder it would lead to many method calls, right? Would it be better to use a StringBuilder instead?

To what type is the String converted implicitly?

like image 988
deamon Avatar asked Sep 02 '14 16:09

deamon


People also ask

How do I merge two strings in scala?

Concatenating strings in Scala Scala provides concat() method to concatenate two strings, this method returns a new string which is created using two strings. You can also use '+' operator to concatenate two strings.

Is concatenation faster than join?

Doing N concatenations requires creating N new strings in the process. join() , on the other hand, only has to create a single string (the final result) and thus works much faster.

Which is faster string concatenation or the StringBuilder class?

Note that regular string concatenations are faster than using the StringBuilder but only when you're using a few of them at a time. If you are using two or three string concatenations, use a string.

Is StringBuilder better than concatenation?

It is always better to use StringBuilder. append to concatenate two string values.


2 Answers

There is a huge HUGE difference in time taken.

If you add strings repeatedly using += you do not optimize away the O(n^2) cost of creating incrementally longer strings. So for adding one or two you won't see a difference, but it doesn't scale; by the time you get to adding 100 (short) strings, using a StringBuilder is over 20x faster. (Precise data: 1.3 us vs. 27.1 us to add the string representations of the numbers 0 to 100; timings should be reproducible to about += 5% and of course are for my machine.)

Using ++= on a var String is far far worse yet, because you are then instructing Scala to treat a string as a character-by-character collection which then requires all sorts of wrappers to make the String look like a collection (including boxed character-by-character addition using the generic version of ++!). Now you're 16x slower again on 100 additions! (Precise data: 428.8 us for ++= on a var string instead of +='s 26.7 us.)

If you write a single statement with a bunch of +es then the Scala compiler will use a StringBuilder and end up with an efficient result (Data: 1.8 us on non-constant strings pulled out of an array).

So, if you add strings with anything other than + in line, and you care about efficiency, use a StringBuilder. Definitely don't use ++= to add another String to a var String; there just isn't any reason to do it, and there's a big runtime penalty.

(Note--very often you don't care at all how efficient your string additions are! Don't clutter your code with extra StringBuilders unless you have reason to suspect that this particular code path is getting called a lot.)

like image 138
Rex Kerr Avatar answered Sep 20 '22 12:09

Rex Kerr


Actually, the inconvenient truth is StringOps usually remains an allocation:

scala> :pa
// Entering paste mode (ctrl-D to finish)

class Concat {
    var x = "x"
    x ++= "y"
    x ++= "z"
}

// Exiting paste mode, now interpreting.

defined class Concat

scala> :javap -prv Concat
Binary file Concat contains $line3.$read$$iw$$iw$Concat
  Size 1211 bytes
  MD5 checksum 1900522728cbb0ed0b1d3f8b962667ad
  Compiled from "<console>"
public class $line3.$read$$iw$$iw$Concat
  SourceFile: "<console>"
[snip]


  public $line3.$read$$iw$$iw$Concat();
    descriptor: ()V
    flags: ACC_PUBLIC
    Code:
      stack=6, locals=1, args_size=1
         0: aload_0       
         1: invokespecial #19                 // Method java/lang/Object."<init>":()V
         4: aload_0       
         5: ldc           #20                 // String x
         7: putfield      #10                 // Field x:Ljava/lang/String;
        10: aload_0       
        11: new           #22                 // class scala/collection/immutable/StringOps
        14: dup           
        15: getstatic     #28                 // Field scala/Predef$.MODULE$:Lscala/Predef$;
        18: aload_0       
        19: invokevirtual #30                 // Method x:()Ljava/lang/String;
        22: invokevirtual #34                 // Method scala/Predef$.augmentString:(Ljava/lang/String;)Ljava/lang/String;
        25: invokespecial #36                 // Method scala/collection/immutable/StringOps."<init>":(Ljava/lang/String;)V
        28: new           #22                 // class scala/collection/immutable/StringOps
        31: dup           
        32: getstatic     #28                 // Field scala/Predef$.MODULE$:Lscala/Predef$;
        35: ldc           #38                 // String y
        37: invokevirtual #34                 // Method scala/Predef$.augmentString:(Ljava/lang/String;)Ljava/lang/String;
        40: invokespecial #36                 // Method scala/collection/immutable/StringOps."<init>":(Ljava/lang/String;)V
        43: getstatic     #28                 // Field scala/Predef$.MODULE$:Lscala/Predef$;
        46: invokevirtual #42                 // Method scala/Predef$.StringCanBuildFrom:()Lscala/collection/generic/CanBuildFrom;
        49: invokevirtual #46                 // Method scala/collection/immutable/StringOps.$plus$plus:(Lscala/collection/GenTraversableOnce;Lscala/collection/generic/CanBuildFrom;)Ljava/lang/Object;
        52: checkcast     #48                 // class java/lang/String
        55: invokevirtual #50                 // Method x_$eq:(Ljava/lang/String;)V

See more demonstration at this answer.

Edit: To say more, you're building the String on each reassignment, so, no you're not using a single StringBuilder.

However, the optimization is done by javac and not the JIT compiler, so to compare fruits of the same kind:

public class Strcat {
    public String strcat(String s) {
        String t = " hi ";
        String u = " by ";
        return s + t + u;    // OK
    }
    public String strcat2(String s) {
        String t = s + " hi ";
        String u = t + " by ";
        return u;            // bad
    }
}

whereas

$ scala
Welcome to Scala version 2.11.2 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_11).
Type in expressions to have them evaluated.
Type :help for more information.

scala> :se -Xprint:typer

scala> class K { def f(s: String, t: String, u: String) = s ++ t ++ u }
[[syntax trees at end of                     typer]] // <console>
def f(s: String, t: String, u: String): String = scala.this.Predef.augmentString(scala.this.Predef.augmentString(s).++[Char, String](scala.this.Predef.augmentString(t))(scala.this.Predef.StringCanBuildFrom)).++[Char, String](scala.this.Predef.augmentString(u))(scala.this.Predef.StringCanBuildFrom)

is bad. Or, worse, to unroll Rex's explanation:

  "abc" ++ "def"

  augmentString("abc").++[Char, String](augmentString("def"))(StringCanBuildFrom)

  collection.mutable.StringBuilder.newBuilder ++= new WrappedString(augmentString("def"))

  val b = collection.mutable.StringBuilder.newBuilder
  new WrappedString(augmentString("def")) foreach b.+=

As Rex explained, StringBuilder overrides ++=(String) but not Growable.++=(Traversable[Char]).

In case you've ever wondered what unaugmentString is for:

    28: invokevirtual #40                 // Method scala/Predef$.augmentString:(Ljava/lang/String;)Ljava/lang/String;
    31: invokevirtual #43                 // Method scala/Predef$.unaugmentString:(Ljava/lang/String;)Ljava/lang/String;
    34: invokespecial #46                 // Method scala/collection/immutable/WrappedString."<init>":(Ljava/lang/String;)V

And just to show that you do finally call unadorned +=(Char) but after boxing and unboxing:

  public final scala.collection.mutable.StringBuilder apply(char);
    flags: ACC_PUBLIC, ACC_FINAL
    Code:
      stack=2, locals=2, args_size=2
         0: aload_0       
         1: getfield      #19                 // Field b$1:Lscala/collection/mutable/StringBuilder;
         4: iload_1       
         5: invokevirtual #24                 // Method scala/collection/mutable/StringBuilder.$plus$eq:(C)Lscala/collection/mutable/StringBuilder;
         8: areturn       
      LocalVariableTable:
        Start  Length  Slot  Name   Signature
               0       9     0  this   L$line10/$read$$iw$$iw$$anonfun$1;
               0       9     1     x   C
      LineNumberTable:
        line 9: 0

  public final java.lang.Object apply(java.lang.Object);
    flags: ACC_PUBLIC, ACC_FINAL, ACC_BRIDGE, ACC_SYNTHETIC
    Code:
      stack=2, locals=2, args_size=2
         0: aload_0       
         1: aload_1       
         2: invokestatic  #35                 // Method scala/runtime/BoxesRunTime.unboxToChar:(Ljava/lang/Object;)C
         5: invokevirtual #37                 // Method apply:(C)Lscala/collection/mutable/StringBuilder;
         8: areturn       
      LocalVariableTable:
        Start  Length  Slot  Name   Signature
               0       9     0  this   L$line10/$read$$iw$$iw$$anonfun$1;
               0       9     1    v1   Ljava/lang/Object;
      LineNumberTable:
        line 9: 0

A good laugh does get some oxygen into the bloodstream.

like image 41
som-snytt Avatar answered Sep 21 '22 12:09

som-snytt