I'm comparing the various ways of storing a <code>String</code> in java by breaking a <code>String</code> down into its constituent parts. I have this code snippet: <pre class="prettyprint"><code>final String message = "ABCDEFGHIJ"; System.out.println("As String " + RamUsageEstimator.humanSizeOf(message)); System.out.println("As byte[] " + RamUsageEstimator.humanSizeOf(message.getBytes())); System.out.println("As char[] " + RamUsageEstimator.humanSizeOf(message.toCharArray())); </code></pre> This is using sizeof to measure the size of the objects. The results of the above show: <pre class="prettyprint"><code>As String 64 bytes As byte[] 32 bytes As char[] 40 bytes </code></pre> Given that a <code>byte</code> is 8 bits and a <code>char</code> is 16 bits why are the results not 10 bytes and 20 bytes respectively? Also what is the overhead for the <code>String</code> object that causes it to be twice the size of the underlying <code>byte[]</code>? This is using <pre class="prettyprint"><code>java version "1.8.0_60" Java(TM) SE Runtime Environment (build 1.8.0_60-b27) Java HotSpot(TM) 64-Bit Server VM (build 25.60-b23, mixed mode) </code></pre> On OSX

The data below is for Hotspot / Java 8 - numbers will vary for other JVMs/Java versions (for example, in Java 7, <code>String</code> has two additional <code>int</code> fields). A <code>new Object()</code> takes 12 bytes of memory (due to internal things such as the object header). A String has (number of bytes in brackets): <ul> <li>an object header (12),</li> <li>a reference to a <code>char[]</code> (4 - assuming compressed OOP in 64 bit JVM),</li> <li>an <code>int hash</code> (4).</li> </ul> That's 20 bytes but objects get padded to multiples of 8 bytes => 24. So that's already 24 bytes on top of the actual content of the array. The <code>char[]</code> has a header (12), a length (4) and each char (10 x 2 = 20) padded to the next multiple of 8 - or 40 in total. The <code>byte[]</code> has a header (12), a length (4) and each byte (10 x 1 = 10) = 26, padded to the next multiple of 8 = 32. So we get to your numbers. Also note that the number of bytes depends on the encoding you use - if you retry with <code>message.getBytes(StandardCharsets.UTF_16)</code> for example, you will see that the byte array uses 40 bytes instead of 32. <hr> You can use jol to visualise the memory usage and confirm the calculation above. The output for the <code>char[]</code> is: <pre class="prettyprint"><code> OFFSET SIZE TYPE DESCRIPTION VALUE 0 4 (object header) 01 00 00 00 (00000001 00000000 00000000 00000000) (1) 4 4 (object header) 00 00 00 00 (00000000 00000000 00000000 00000000) (0) 8 4 (object header) 41 00 00 f8 (01000001 00000000 00000000 11111000) (-134217663) 12 4 (object header) 0a 00 00 00 (00001010 00000000 00000000 00000000) (10) 16 20 char [C.<elements> N/A 36 4 (loss due to the next object alignment) Instance size: 40 bytes (reported by Instrumentation API) </code></pre> So you can see the header of 12 (first 3 lines), the length (line 4), the chars (line 5) and the padding (line 6). Similarly for the String (note that this excludes the size of the array itself): <pre class="prettyprint"><code> OFFSET SIZE TYPE DESCRIPTION VALUE 0 4 (object header) 01 00 00 00 (00000001 00000000 00000000 00000000) (1) 4 4 (object header) 00 00 00 00 (00000000 00000000 00000000 00000000) (0) 8 4 (object header) da 02 00 f8 (11011010 00000010 00000000 11111000) (-134216998) 12 4 char[] String.value [A, B, C, D, E, F, G, H, I, J] 16 4 int String.hash 0 20 4 (loss due to the next object alignment) Instance size: 24 bytes (reported by Instrumentation API) </code></pre>

Differing sizes of String representation in Java

I'm comparing the various ways of storing a String in java by breaking a String down into its constituent parts. I have this code snippet:

final String message = "ABCDEFGHIJ";
System.out.println("As String " + RamUsageEstimator.humanSizeOf(message));
System.out.println("As byte[] " + RamUsageEstimator.humanSizeOf(message.getBytes()));
System.out.println("As char[] " + RamUsageEstimator.humanSizeOf(message.toCharArray()));

This is using sizeof to measure the size of the objects. The results of the above show:

As String 64 bytes
As byte[] 32 bytes
As char[] 40 bytes

Given that a byte is 8 bits and a char is 16 bits why are the results not 10 bytes and 20 bytes respectively?

Also what is the overhead for the String object that causes it to be twice the size of the underlying byte[]?

This is using

java version "1.8.0_60"
Java(TM) SE Runtime Environment (build 1.8.0_60-b27)
Java HotSpot(TM) 64-Bit Server VM (build 25.60-b23, mixed mode)

On OSX

What is string representation?

A String is represented as objects in Java. Accordingly, an object contains values stored in instance variables within the object. An object also contains bodies of code that operate upon the object. These bodies of code are called methods.

How many bytes is string in Java?

An empty String takes 40 bytes—enough memory to fit 20 Java characters.

How do you determine a strings byte size?

1) s. length() will give you the number of bytes. Since characters are one byte (at least in ASCII), the number of characters is the same as the number of bytes.

The data below is for Hotspot / Java 8 - numbers will vary for other JVMs/Java versions (for example, in Java 7, String has two additional int fields).

A new Object() takes 12 bytes of memory (due to internal things such as the object header).

A String has (number of bytes in brackets):

an object header (12),
a reference to a char[] (4 - assuming compressed OOP in 64 bit JVM),
an int hash (4).

That's 20 bytes but objects get padded to multiples of 8 bytes => 24. So that's already 24 bytes on top of the actual content of the array.

The char[] has a header (12), a length (4) and each char (10 x 2 = 20) padded to the next multiple of 8 - or 40 in total.

The byte[] has a header (12), a length (4) and each byte (10 x 1 = 10) = 26, padded to the next multiple of 8 = 32.

So we get to your numbers.

Also note that the number of bytes depends on the encoding you use - if you retry with message.getBytes(StandardCharsets.UTF_16) for example, you will see that the byte array uses 40 bytes instead of 32.

You can use jol to visualise the memory usage and confirm the calculation above. The output for the char[] is:

 OFFSET  SIZE  TYPE DESCRIPTION                    VALUE
      0     4       (object header)                01 00 00 00 (00000001 00000000 00000000 00000000) (1)
      4     4       (object header)                00 00 00 00 (00000000 00000000 00000000 00000000) (0)
      8     4       (object header)                41 00 00 f8 (01000001 00000000 00000000 11111000) (-134217663)
     12     4       (object header)                0a 00 00 00 (00001010 00000000 00000000 00000000) (10)
     16    20  char [C.<elements>                  N/A
     36     4       (loss due to the next object alignment)
Instance size: 40 bytes (reported by Instrumentation API)

So you can see the header of 12 (first 3 lines), the length (line 4), the chars (line 5) and the padding (line 6).

Similarly for the String (note that this excludes the size of the array itself):

 OFFSET  SIZE   TYPE DESCRIPTION                    VALUE
      0     4        (object header)                01 00 00 00 (00000001 00000000 00000000 00000000) (1)
      4     4        (object header)                00 00 00 00 (00000000 00000000 00000000 00000000) (0)
      8     4        (object header)                da 02 00 f8 (11011010 00000010 00000000 11111000) (-134216998)
     12     4 char[] String.value                   [A, B, C, D, E, F, G, H, I, J]
     16     4    int String.hash                    0
     20     4        (loss due to the next object alignment)
Instance size: 24 bytes (reported by Instrumentation API)

Each of your test, estimates the size of an Object. In the first case a String object, in the second a byte array object, and finally a char array object. Every object, as instance of a class, may contains some private attributes and other things like that; so you cannot expect something better than: a String of 10 chars, contains at least the 10 chars, each of 2 bytes long, then the whole size should be ≥20 bytes, which is coherent with your results.

For the byte/char comparison you are wrong, because the byte array from a string will give you all the bytes for a given encoding. It may happens that your current encoding uses more than one byte for a char.

You may have a look at Java source code for Object, String class and array support in JVM to understand what happens exactly.

Differing sizes of String representation in Java

Tags:

java

imrichardcole

People also ask

2 Answers

assylias

Jean-Baptiste Yunès

Recent Activity

Donate For Us

Differing sizes of String representation in Java

Tags:

java

imrichardcole

People also ask

2 Answers

assylias

Jean-Baptiste Yunès

Related questions

Recent Activity

Donate For Us