What is the character encoding of String in Java?

People also ask

What is the character encoding in Java?

The native character encoding of the Java programming language is UTF-16. A charset in the Java platform therefore defines a mapping between sequences of sixteen-bit UTF-16 code units (that is, sequences of chars) and sequences of bytes.

What is the encoding of a String?

String objects use UTF-16 encoding. The problem with UTF-16 is that it cannot be modified. There is only one way that can be used to get different encoding i.e. byte[] array.

Is Java a UTF-8 String?

A Java String is internally always encoded in UTF-16 - but you really should think about it like this: an encoding is a way to translate between Strings and bytes.

Is Java a UTF-16 String?

A Java String (before Java 9) is represented internally in the Java VM using bytes, encoded as UTF-16.

Java stores strings as UTF-16 internally.
"default encoding" isn't quite right. Java stores strings as UTF-16 internally, but the encoding used externally, the "system default encoding", varies from platform to platform, and can even be altered by things like environment variables on some platforms.

ASCII is a subset of Latin 1 which is a subset of Unicode. UTF-16 is a way of encoding Unicode. So if you perform your int i = 'x' test for any character that falls in the ASCII range you'll get the ASCII value. UTF-16 can represent a lot more characters than ASCII, however.
From the java.lang.Character docs:

The Java 2 platform uses the UTF-16 representation in char arrays and in the String and StringBuffer classes.

So it's defined as part of the Java 2 platform that UTF-16 is used for these classes.

1) Strings are objects, which typically contain a char array and the strings's length. The character array is usually implemented as a contiguous array of 16-bit words, each one containing a Unicode character in native byte order.

2) Assigning a character value to an integer converts the 16-bit Unicode character code into its integer equivalent. Thus 'c', which is U+0063, becomes 0x0063, or 99.

3) Since each String is an object, it contains other information than its class members (e.g., class descriptor word, lock/semaphore word, etc.).

ADENDUM
The object contents depend on the JVM implementation (which determines the inherent overhead associated with each object), and how the class is actually coded (i.e., some libraries may be more efficient than others).

EXAMPLE
A typical implementation will allocate an overhead of two words per object instance (for the class descriptor/pointer, and a semaphore/lock control word); a String object also contains an int length and a char[] array reference. The actual character contents of the string are stored in a second object, the char[] array, which in turn is allocated two words, plus an array length word, plus as many 16-bit char elements as needed for the string (plus any extra chars that were left hanging around when the string was created).

ADDENDUM 2
The case that one char represents one Unicode character is only true in most of the cases. This would imply UCS-2 encoding and true before 2005. But by now Unicode has become larger and Strings have to be encoded using UTF-16 -- where alas a single Unicode character may use two chars in a Java String.

Take a look at the actual source code for Apache's implementation, e.g. at:
http://www.docjar.com/html/api/java/lang/String.java.html

While this doesn't answer your question, it is worth noting that... In the java byte code (class file), the string is stored in UTF-8. http://java.sun.com/docs/books/jvms/second_edition/html/ClassFile.doc.html

Edit : thanks to LoadMaster for helping me correcting my answer :)

1) All internal String processing is made in UTF-16.

2) ASCII is a subset of UTF-16.

3) Internally in Java is UTF-16. For the rest, it depends on where you are, yes.

Related questions
                            
                                How to handle jodatime Illegal instant due to time zone offset transition
                            
                                Get XML only immediate children elements by name
                            
                                How do I call java methods on an object from a FreeMarker template?
                            
                                JAXB inheritance, unmarshal to subclass of marshaled class
                            
                                Does curl remove new line characters?
                            
                                Android Studio - Tablet emulator not showing correct resolution
                            
                                Why defining class as final improves JVM performance?
                            
                                What does .pack() do?
                            
                                Skipped breakpoint because it happened inside debugger evaluation - Intellij IDEA
                            
                                when to use Set vs. Collection?
                            
                                Java - Can final variables be initialized in static initialization block?
                            
                                What's the difference between "package" and "module"?
                            
                                What is AspectJ good for? [closed]
                            
                                How do you escape curly braces in javadoc inline tags, such as the {@code} tag
                            
                                Java Runtime Performance Vs Native C / C++ Code?
                            
                                Using Spring in a standalone application
                            
                                Is Jackson really unable to deserialize json into a generic type?
                            
                                Why is it good to close() an inputstream?
                            
                                How much memory does a string use in Java 8?
                            
                                Java: how to represent graphs?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What is the character encoding of String in Java?

Tags:

java

string

character-encoding

People also ask

Recent Activity

Donate For Us