Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java char array seems to need more than 2 bytes per char

When I run following program(running with "java -Xmx151M -cp . com.some.package.xmlfun.Main") :

package com.some.package.xmlfun;
public class Main {

    public static void main(String [] args) {
        char [] chars = new char[50 * 1024 * 1024];

    }
}

I need to increase maximum memory to at least 151M (-Xmx151M). Accordingly, when I increase array size the limit needs to be increased:

  • 50 * 1024 * 1024 -> -Xmx151M
  • 100 * 1024 * 1024 -> -Xmx301M
  • 150 * 1024 * 1024 -> -Xmx451M

Why does it looks like java needs 3 bytes per char, instead of 2 bytes as documentation suggests?

Also when I similarly create array of long it seems to need 12 bytes per long, instead of 8, with int it needs 6 bytes instead of 4. Generally it looks like it needs array_size * element_size * 1.5

Compiling with - javac \com\som\package\xmlfun\\*java

Running with - java -Xmx151M -cp . com.some.package.xmlfun.Main

like image 747
user2528238 Avatar asked Jun 27 '13 13:06

user2528238


People also ask

Why does char take 2 bytes in Java?

And, every char is made up of 2 bytes because Java internally uses UTF-16. For instance, if a String contains a word in the English language, the leading 8 bits will all be 0 for every char, as an ASCII character can be represented using a single byte.

How many bytes does a char array use?

It occupies exactly 32 bytes of memory. Internally everything an address.

How many bytes is a char array in Java?

The first 128 characters (corresponding to the characters available in ASCII) can be stored in one byte each. Characters with order numbers 128 or higher need two or more bytes.

Why char uses 2 bytes in Java and what is u0000?

The 'char' data type in Java originally used for representing 16-bit Unicode. Therefore the size of the char data type in Java is 2 byte, and same for the C language is 1 byte. Hence Java uses Unicode standard. What are features of Java language?


1 Answers

I guess what you are seeing can be easily explained by how the heap in the JVM is organized.

When you pass the parameter -Xmx to the JVM, you are defining what the maximum heap size should be. However, it is not directly related to the maximum size of an array that you can allocate.

In the JVM, the garbage collector is responsible for allocating memory for objects and for cleaning up dead objects. It is the garbage collector that decides how it organizes the heap.

You usually have something called Eden space, then two survivor spaces and finally the tenured generation. All of these are inside the heap, and the GC divides the maximum heap among them. For more details on these memory pools, check this brilliant answer: https://stackoverflow.com/a/1262474/150339

I don't know what the default values are, and they might indeed depend on your system. I've just checked (using sudo jmap PID) how the memory pools divide the heap in an application I run on a system running Ubuntu 64-bits and Oracle's Java 7. The machine has 1.7GB ram.

In that configuration, I only pass -Xmx to the JVM, and the GC divides the heap as follows:

  • about 27% for the Eden space
  • about 3% for each of the survivor spaces
  • about 67% for the tenured generation.

If you have a similar distribution, it would mean that the largest contiguous block of your 151MB is in the tenured generation, and is of about 100MB. Since an array is a contiguous block of memory, and you simply cannot have an object span multiple memory pools, it explains the behaviour you are seeing.

You could try playing with the garbage collector parameters. Check the garbage collector parameters over here: http://www.oracle.com/technetwork/java/javase/tech/vmoptions-jsp-140102.html

Your results seem pretty reasonable to me.

like image 150
Bruno Reis Avatar answered Oct 17 '22 00:10

Bruno Reis