What are the advantages of compact strings over compressed strings in JDK9?
Compact String – Java 9 Java 9 has brought the concept of compact Strings back. This means that whenever we create a String if all the characters of the String can be represented using a byte — LATIN-1 representation, a byte array will be used internally, such that one byte is given for one character.
"String Compression Algorithm” or “Run Length Encoding” happens when you compress a string, and the consecutive duplicates of each string are replaced with the character, followed by the consecutive, repeated character count. For example: After string compression, the string “aaaabbcddddd” would return “a4b2c1d5”.
string compression in java can be performed using a ZLIB compression library. It offers some distinct features to effectively compress string data in java. Although the compression rate could vary based on the factors such as the amount of compression required, length of data and repetitions in string data.
char is a primitive data type whereas String is a class in java. char represents a single character whereas String can have zero or more characters. So String is an array of chars. We define char in java program using single quote (') whereas we can define String in Java using double quotes (“).
Compressed strings (Java 6) and compact strings (Java 9) both have the same motivation (strings are often effectively Latin-1, so half the space is wasted) and goal (make those strings small) but the implementations differ a lot.
In an interview Aleksey Shipilëv (who was in charge of implementing the Java 9 feature) had this to say about compressed strings:
UseCompressedStrings feature was rather conservative: while distinguishing between
char[]
andbyte[]
case, and trying to compress thechar[]
intobyte[]
onString
construction, it done mostString
operations onchar[]
, which required to unpack theString.
Therefore, it benefited only a special type of workloads, where most strings are compressible (so compression does not go to waste), and only a limited amount of knownString
operations are performed on them (so no unpacking is needed). In great many workloads, enabling-XX:+UseCompressedStrings
was a pessimization.[...] UseCompressedStrings implementation was basically an optional feature that maintained a completely distinct
String
implementation inalt-rt.jar
, which was loaded once the VM option is supplied. Optional features are harder to test, since they double the number of option combinations to try.
In Java 9 on the other hand, compact strings are fully integrated into the JDK source. String
is always backed by byte[]
, where characters use one byte if they are Latin-1 and otherwise two. Most operations do a check to see which is the case, e.g. charAt
:
public char charAt(int index) { if (isLatin1()) { return StringLatin1.charAt(value, index); } else { return StringUTF16.charAt(value, index); } }
Compact strings are enabled by default and can be partially disabled - "partially" because they are still backed by a byte[]
and operations returning char
s must still put them together from two separate bytes (due to intrinsics it is hard to say whether this has a performance impact).
If you're interested in more background on compact strings I recommend to read the interview I linked to above and/or watch this great talk by the same Aleksey Shipilëv (which also explains the new string concatenation).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With