Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Efficiently append last chars to StringBuilder

Note: This question is about Java >= 9 which introduced "compact strings"


Let's say I am appending an unknown number of strings (or chars) to a StringBuilder and at some point determine that I am appending the last string.

How can this be done efficiently?

Background

If the capacity of the string builder is not large enough it will always increase it to max(oldCap + str.lenght(), oldCap * 2 + 2). So if you are unlucky and the capacity is not enough for the last string, it will unnecessarily double the capcity, e.g.:

StringBuilder sb = new StringBuilder(4000);
sb.append("aaa..."); // 4000 * "a"
// Last string:
sb.append("b"); // Unnecessarily increases capacity from 4000 to 8002
return sb.toString();

StringBuilder offers the methods capacity(), length() and getChars(...), however manually creating a char[] and then creating a string will be inefficient because:

  • Due to "compact strings" the string builder has to convert its bytes to chars
  • When calling one of the String constructors the chars have to be compacted to bytes again

Another option would be to check capacity() and if necessary create a new StringBuilder(sb.length() + str.length()), then append sb and str:

StringBuilder sb = new StringBuilder(4000);
sb.append("aaa..."); // 4000 * "a"

String str = "b";
if (sb.capacity() - sb.length() < str.length()) {
    return new StringBuilder(sb.length() + str.length())
        .append(sb)
        .append(str)
        .toString();
}
else {
    return sb.append(str).toString();
}

The only disadvantage is that if the existing string builder or the new string is non-Latin 1 (2 bytes per char), the newly created string builder has to be "inflated" from 1 byte per char (Latin 1) to 2 bytes per char.

like image 603
Marcono1234 Avatar asked Apr 25 '26 02:04

Marcono1234


1 Answers

You are describing separate different problems IMO, but neither of them is an "actual" problem.

First, is the fact that StringBuilder allocates too much space - that is rarely (if ever) a problem in practice. Think about any List/Set/Map - they do the same thing, might allocate too much, but when you remove an element, they don't shrink their internal storage. They do have a method for that; but so does StringBuilder:

 trimToSize

Due to "compact strings" the string builder has to convert its bytes to chars.

StringBuilder knows what it is storing via the coder field in AbstractStringBuilder which it extends. With compact Strings, String holds its data in a byte[] now (it has a coder too), thus I don't understand where that conversion from byte[] to char[] is supposed to happen. StringBuilder::toString is defined as:

public String toString() {
    // Create a copy, don't share the array
    return isLatin1() ? StringLatin1.newString(value, 0, count)
                      : StringUTF16.newString(value, 0, count);
}

Notice the isLatin1 check - StringBuilder knows what type of data it has internally; thus no conversion when possible.

I assume that by this:

When calling one of the String constructors the chars have to be compacted to bytes again

you mean:

char [] some = ...
String s = new String(some);

I don't know why you are using again here, but may be I am missing something. Just notice that this conversion from char[] to byte[] indeed has to happen, but it's fairly trivial to do (the last 8 bits have to be empty), and as soon as a single char does not meet the precondition, the entire conversion is bailed out. So you either store all characters in LATIN1, or you don't.

like image 68
Eugene Avatar answered Apr 27 '26 18:04

Eugene



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!