Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

java.lang.String: length() vs. count?

I have a test string:

String test = "oiwfoilfhlshflkshdlkfhsdlfhlskdhfslkhvslkvhvkjdhfkljshvdfkjhvdsköljhvskljdfhvblskjbkvljslkhjjssdlkhdsflksjflkjdlfjslkjljlfjslfjldfjjhvbksdjhbvslkdfjhbvslkjvhbslkvbjbn";

During debug I noticed following. When I print out the length:

System.out.println("Test length() : " + test.length());

returns

Test length() : 166

When I debug, I can read 333 as count for test variable.

enter image description here

What does the count represent?

like image 930
Tranquillo Avatar asked Dec 12 '18 14:12

Tranquillo


2 Answers

String implementation contains an array of chars - value. So count field in some implementations is used for calculation of the array's declared size.

One could notice that the count value provided differs the given String length twice - this looks like a hint to ASCII/UTF-8/UTF-16 divergence as per 1 Unicode (UTF-16) symbol is represented by 2 bytes in a String instance.

An example:

String str = "f";
str.length(); // 1
str.getBytes().length; // 1

but

String str = "ў";
str.length(); // 1
str.getBytes().length; // 2

See also:

  • Apache String Implementation
  • Android String Implementation

What JDK are you using? It may bring more light on what exactly your count is.

like image 92
Anton Hlinisty Avatar answered Nov 07 '22 20:11

Anton Hlinisty


When asking android Java-related questions, always mention that as there are some major differences.

The android ART runtime optimizes java.lang.String by compressing the normally two-byte Java chars into single-byte ASCII strings when possible. You can see it in the source of java.lang.String:

public int length() {
    // BEGIN Android-changed: Get length from count field rather than value array (see above).
    // return value.length;
    final boolean STRING_COMPRESSION_ENABLED = true;
    if (STRING_COMPRESSION_ENABLED) {
        // For the compression purposes (save the characters as 8-bit if all characters
        // are ASCII), the least significant bit of "count" is used as the compression flag.
        return (count >>> 1);
    } else {
        return count;
    }
}

String compression is specified in the native code as:

// String Compression
static constexpr bool kUseStringCompression = true;
enum class StringCompressionFlag : uint32_t {
    kCompressed = 0u,
    kUncompressed = 1u
};

This flag is OR-ed with the count value:

static int32_t GetFlaggedCount(int32_t length, bool compressible) {
    return kUseStringCompression
        ? static_cast<int32_t>((static_cast<uint32_t>(length) << 1) |
                               (static_cast<uint32_t>(compressible
                                                          ? StringCompressionFlag::kCompressed
                                                          : StringCompressionFlag::kUncompressed)))
        : length;
}

When loading strings from the constant pool, however, string compression is not performed. Hence you get a doubling of the original char count + 1 (333 = 166 * 2 + 1). That additional 1 is the "uncompressed" flag.

like image 3
rustyx Avatar answered Nov 07 '22 20:11

rustyx