I am considering submitting an RFE (request for enhancement) to Oracle Bug Database which is supposed to significantly increase string concatenation performance. But before I do it I'd like to hear experts' comments on whether it makes sense.
The idea is based on the fact that the existing String.concat(String) works two times faster on 2 strings than StringBuilder. The problem is that there is no method to concatenate 3 or more strings. External methods cannot do this because String.concat uses a package private constructor String(int offset, int count, char[] value)
which does not copy the char array but uses it directly. This ensure high String.concat performance. Being in the same package StringBuilder still cannot use this constructor because then the String's char array will be exposed for modifications.
I suggest to add the following methods to String
public static String concat(String s1, String s2)
public static String concat(String s1, String s2, String s3)
public static String concat(String s1, String s2, String s3, String s4)
public static String concat(String s1, String s2, String s3, String s4, String s5)
public static String concat(String s1, String... array)
Note: this kind of overloading is used in EnumSet.of, for efficiency.
This is the implementation of one of the methods, others work the same way
public final class String {
private final char value[];
private final int count;
private final int offset;
String(int offset, int count, char value[]) {
this.value = value;
this.offset = offset;
this.count = count;
}
public static String concat(String s1, String s2, String s3) {
char buf[] = new char[s1.count + s2.count + s3.count];
System.arraycopy(s1.value, s1.offset, buf, 0, s1.count);
System.arraycopy(s2.value, s2.offset, buf, s1.count, s2.count);
System.arraycopy(s3.value, s3.offset, buf, s1.count + s2.count, s3.count);
return new String(0, buf.length, buf);
}
Also, after these methods are added to String, Java compiler for
String s = s1 + s2 + s3;
will be able to build efficient
String s = String.concat(s1, s2, s3);
instead of current inefficient
String s = (new StringBuilder(String.valueOf(s1))).append(s2).append(s3).toString();
UPDATE Performance test. I ran it on my notebook Intel Celeron 925, concatenation of 3 strings, my String2 class emulates exactly how it would be in real java.lang.String. String lengths are chosen so that to put StringBuilder in the most unfavourable conditions, that is when it needs to expand its internal buffer capacity on each append, while concat always creates char[] only once.
public class String2 {
private final char value[];
private final int count;
private final int offset;
String2(String s) {
value = s.toCharArray();
offset = 0;
count = value.length;
}
String2(int offset, int count, char value[]) {
this.value = value;
this.offset = offset;
this.count = count;
}
public static String2 concat(String2 s1, String2 s2, String2 s3) {
char buf[] = new char[s1.count + s2.count + s3.count];
System.arraycopy(s1.value, s1.offset, buf, 0, s1.count);
System.arraycopy(s2.value, s2.offset, buf, s1.count, s2.count);
System.arraycopy(s3.value, s3.offset, buf, s1.count + s2.count, s3.count);
return new String2(0, buf.length, buf);
}
public static void main(String[] args) {
String s1 = "1";
String s2 = "11111111111111111";
String s3 = "11111111111111111111111111111111111111111";
String2 s21 = new String2(s1);
String2 s22 = new String2(s2);
String2 s23 = new String2(s3);
long t0 = System.currentTimeMillis();
for (int i = 0; i < 1000000; i++) {
String2 s = String2.concat(s21, s22, s23);
// String s = new StringBuilder(s1).append(s2).append(s3).toString();
}
System.out.println(System.currentTimeMillis() - t0);
}
}
on 1,000,000 iterations the results are:
version 1 = ~200 ms
version 2 = ~400 ms
If you are concatenating a list of strings, then the preferred way is to use join() as it accepts a list of strings and concatenates them and is most readable in this case. If you are looking for performance, append/join is marginally faster there if you are using extremely long strings.
Let me say the reason that string concatenation is slow is because strings are immutable. This means every time you write "+=", a new String is created. This means the way you build up your string is in the worst case, O(n2).
concat() method is better than the '+' operator because it creates a new object only when the string length is greater than zero(0), so it uses less amount of memory. + operator always creates a new string irrespective of the length of string therefore it takes more memory.
Fact is, the use cases for which the performance of a single string concatenation expression matters are not that common. In most cases where performance is bound by string concatenation, it happens in a loop, building the end product step by step, and in that context the mutable StringBuilder
is a clear winner. This is why I don't see much perspective for a proposal that optimizes a minority concern by intervening into the fundamental String
class. But anyway, as far as comparing performance, your approach does have a significant edge:
import com.google.caliper.Runner;
import com.google.caliper.SimpleBenchmark;
public class Performance extends SimpleBenchmark
{
final Random rnd = new Random();
final String as1 = "aoeuaoeuaoeu", as2 = "snthsnthnsth", as3 = "3453409345";
final char[] c1 = as1.toCharArray(), c2 = as2.toCharArray(), c3 = as3.toCharArray();
public static char[] concat(char[] s1, char[] s2, char[] s3) {
char buf[] = new char[s1.length + s2.length + s3.length];
System.arraycopy(s1, 0, buf, 0, s1.length);
System.arraycopy(s2, 0, buf, s1.length, s2.length);
System.arraycopy(s3, 0, buf, s1.length + s2.length, s3.length);
return buf;
}
public static String build(String s1, String s2, String s3) {
final StringBuilder b = new StringBuilder(s1.length() + s2.length() + s3.length());
b.append(s1).append(s2).append(s3);
return b.toString();
}
public static String plus(String s1, String s2, String s3) {
return s1 + s2 + s3;
}
public int timeConcat(int reps) {
int tot = rnd.nextInt();
for (int i = 0; i < reps; i++) tot += concat(c1, c2, c3).length;
return tot;
}
public int timeBuild(int reps) {
int tot = rnd.nextInt();
for (int i = 0; i < reps; i++) tot += build(as1, as2, as3).length();
return tot;
}
public int timePlus(int reps) {
int tot = rnd.nextInt();
for (int i = 0; i < reps; i++) tot += plus(as1, as2, as3).length();
return tot;
}
public static void main(String... args) {
Runner.main(Performance.class, args);
}
}
Result:
0% Scenario{vm=java, trial=0, benchmark=Concat} 65.81 ns; σ=2.56 ns @ 10 trials
33% Scenario{vm=java, trial=0, benchmark=Build} 102.94 ns; σ=2.27 ns @ 10 trials
67% Scenario{vm=java, trial=0, benchmark=Plus} 160.14 ns; σ=2.94 ns @ 10 trials
benchmark ns linear runtime
Concat 65.8 ============
Build 102.9 ===================
Plus 160.1 ==============================
If you want them to take you seriously, you need to do the hard work of fully implementing, testing and thoroughly benchmarking your proposed change. And a full implementation would include the changes to the Java compiler to emit bytecodes to use your methods.
Write up the results, and then submit the code changes as a patch to OpenJDK 7 or 8.
My impression is that the Java developers don't have the resources to try out speculative ideas for optimizations like this one. An RFE without benchmarking results and code patches is unlikely to receive attention ...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With