I call a webservice, that gives me back a response xml that has UTF-8 encoding. I checked that in java using <code>getAllHeaders()</code> method. Now, in my java code, I take that response and then do some processing on it. And later, pass it on to a different service. Now, I googled a bit and found out that by default the encoding in Java for strings is UTF-16. In my response xml, one of the elements had a character É. Now this got screwed in the post processing request that I make to a different service. Instead of sending É, it sent some jibberish stuff. Now I wanted to know, will there be really a lot of difference in the two of these encodings? And if I wanted to know what will É convert from UTF-8 to UTF-16, then how can I do that?

<img src="https://i.stack.imgur.com/r2PvK.png" alt=""> Both UTF-8 and UTF-16 are variable length encodings. However, in UTF-8 a character may occupy a minimum of 8 bits, while in UTF-16 character length starts with 16 bits. Main UTF-8 pros: <ol> <li>Basic ASCII characters like digits, Latin characters with no accents, etc. occupy one byte which is identical to US-ASCII representation. This way all US-ASCII strings become valid UTF-8, which provides decent backwards compatibility in many cases.</li> <li>No null bytes, which allows to use null-terminated strings, this introduces a great deal of backwards compatibility too.</li> </ol> Main UTF-8 cons: <ol> <li>Many common characters have different length, which slows indexing and calculating a string length terribly.</li> </ol> Main UTF-16 pros: <ol> <li>Most reasonable characters, like Latin, Cyrillic, Chinese, Japanese can be represented with 2 bytes. Unless really exotic characters are needed, this means that the 16-bit subset of UTF-16 can be used as a fixed-length encoding, which speeds indexing.</li> </ol> Main UTF-16 cons: <ol> <li>Lots of null bytes in US-ASCII strings, which means no null-terminated strings and a lot of wasted memory.</li> </ol> In general, UTF-16 is usually better for in-memory representation while UTF-8 is extremely good for text files and network protocol

Is there a drastic difference between UTF-8 and UTF-16

Tags:

java

character-encoding

xml

utf-8

utf-16

I call a webservice, that gives me back a response xml that has UTF-8 encoding. I checked that in java using getAllHeaders() method.

Now, in my java code, I take that response and then do some processing on it. And later, pass it on to a different service.

Now, I googled a bit and found out that by default the encoding in Java for strings is UTF-16.

In my response xml, one of the elements had a character É. Now this got screwed in the post processing request that I make to a different service.

Instead of sending É, it sent some jibberish stuff. Now I wanted to know, will there be really a lot of difference in the two of these encodings? And if I wanted to know what will É convert from UTF-8 to UTF-16, then how can I do that?

412

asked Mar 14 '14 12:03

Kraken

1 Answers

Both UTF-8 and UTF-16 are variable length encodings. However, in UTF-8 a character may occupy a minimum of 8 bits, while in UTF-16 character length starts with 16 bits.

Main UTF-8 pros:

Basic ASCII characters like digits, Latin characters with no accents, etc. occupy one byte which is identical to US-ASCII representation. This way all US-ASCII strings become valid UTF-8, which provides decent backwards compatibility in many cases.
No null bytes, which allows to use null-terminated strings, this introduces a great deal of backwards compatibility too.

Main UTF-8 cons:

Many common characters have different length, which slows indexing and calculating a string length terribly.

Main UTF-16 pros:

Most reasonable characters, like Latin, Cyrillic, Chinese, Japanese can be represented with 2 bytes. Unless really exotic characters are needed, this means that the 16-bit subset of UTF-16 can be used as a fixed-length encoding, which speeds indexing.

Main UTF-16 cons:

Lots of null bytes in US-ASCII strings, which means no null-terminated strings and a lot of wasted memory.

In general, UTF-16 is usually better for in-memory representation while UTF-8 is extremely good for text files and network protocol

102

answered Sep 16 '22 23:09

Arjun Chaudhary

Related questions
                            
                                Java bitwise comparison of a byte
                            
                                What does the <TYPE> in java mean?
                            
                                Should I write tests for class A if it's covered from class B
                            
                                getOutputMediaFile(int) is undefined for the type new Camera.PictureCallback(){} error confusing me
                            
                                How to check if a list contains a sublist in a given order in Java
                            
                                Hibernate : Why is it trying to drop/create database on startup?
                            
                                Getting Next value from sequence with spring hibernate
                            
                                'next_level_button' is incompatible with attribute android:layout_below (attr) reference
                            
                                Is it possible to loop through a classes members in java?
                            
                                Improving performance of string concatenation in Java [duplicate]
                            
                                Enums,use in switch case
                            
                                How much processing and memory use does casting take in Java?
                            
                                How to load system properties file in Spring?
                            
                                How to put the character & into an xml in android correctly
                            
                                Using Session Scope in Spring Beans
                            
                                How can I create a static final java.net.URL?
                            
                                Scope of usefulness of interface in java
                            
                                How to initialize a two column arraylist?
                            
                                Java sort ArrayList with custom fields by number and alphabetically
                            
                                Get path directory only and discard file in Java

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With