Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Difference between compact strings and compressed strings in Java 9

What are the advantages of compact strings over compressed strings in JDK9?

like image 800
soorapadman Avatar asked May 25 '17 10:05

soorapadman


People also ask

What is compact string in Java?

Compact String – Java 9 Java 9 has brought the concept of compact Strings back. This means that whenever we create a String if all the characters of the String can be represented using a byte — LATIN-1 representation, a byte array will be used internally, such that one byte is given for one character.

What is a compressed string?

"String Compression Algorithm” or “Run Length Encoding” happens when you compress a string, and the consecutive duplicates of each string are replaced with the character, followed by the consecutive, repeated character count. For example: After string compression, the string “aaaabbcddddd” would return “a4b2c1d5”.

Can we compress string in Java?

string compression in java can be performed using a ZLIB compression library. It offers some distinct features to effectively compress string data in java. Although the compression rate could vary based on the factors such as the amount of compression required, length of data and repetitions in string data.

What is the difference between character and string in Java?

char is a primitive data type whereas String is a class in java. char represents a single character whereas String can have zero or more characters. So String is an array of chars. We define char in java program using single quote (') whereas we can define String in Java using double quotes (“).


1 Answers

Compressed strings (Java 6) and compact strings (Java 9) both have the same motivation (strings are often effectively Latin-1, so half the space is wasted) and goal (make those strings small) but the implementations differ a lot.

Compressed Strings

In an interview Aleksey Shipilëv (who was in charge of implementing the Java 9 feature) had this to say about compressed strings:

UseCompressedStrings feature was rather conservative: while distinguishing between char[] and byte[] case, and trying to compress the char[] into byte[] on String construction, it done most String operations on char[], which required to unpack the String. Therefore, it benefited only a special type of workloads, where most strings are compressible (so compression does not go to waste), and only a limited amount of known String operations are performed on them (so no unpacking is needed). In great many workloads, enabling -XX:+UseCompressedStrings was a pessimization.

[...] UseCompressedStrings implementation was basically an optional feature that maintained a completely distinct String implementation in alt-rt.jar, which was loaded once the VM option is supplied. Optional features are harder to test, since they double the number of option combinations to try.

Compact Strings

In Java 9 on the other hand, compact strings are fully integrated into the JDK source. String is always backed by byte[], where characters use one byte if they are Latin-1 and otherwise two. Most operations do a check to see which is the case, e.g. charAt:

public char charAt(int index) {     if (isLatin1()) {         return StringLatin1.charAt(value, index);     } else {         return StringUTF16.charAt(value, index);     } } 

Compact strings are enabled by default and can be partially disabled - "partially" because they are still backed by a byte[] and operations returning chars must still put them together from two separate bytes (due to intrinsics it is hard to say whether this has a performance impact).

More

If you're interested in more background on compact strings I recommend to read the interview I linked to above and/or watch this great talk by the same Aleksey Shipilëv (which also explains the new string concatenation).

like image 60
Nicolai Parlog Avatar answered Sep 21 '22 15:09

Nicolai Parlog