UTF-8 Encoding ; Only some Japanese characters are not getting converted

Question

I am getting the parameter value as parameter from the Jersey Web Service, which is in Japaneses characters.

Here, 'japaneseString' is the web service parameter containing the characters in japanese language.

   String name = new String(japaneseString.getBytes(), "UTF-8");

However, I am able to convert a few sting literals successfully, while some of them are creating problems.

The following were successfully converted:

 1) アップル
 2) 赤
 3) 世丕且且世两上与丑万丣丕且丗丕
 4) 世世丗丈

While these din't:

 1) ひほわれよう
 2) 存在する

When I further investigated, i found that these 2 strings are getting converted in to some JUNK characters.

 1) Input: ひほわれよう        Output : �?��?��?れよ�?�
 2) Input: 存在する            Output: 存在�?�る

Any idea why some of the japanese characters are not converted properly?

Thanks.

fge · Accepted Answer

You are mixing concepts here.

A String is just a sequence of characters (chars); a String in itself has no encoding at all. For what it's worth, replace characters in the above with carrier pigeons. Same thing. A carrier pigeon has no encoding. Neither does a char. (1)

What you are doing here:

new String(x.getBytes(), "UTF-8")

is a "poor man's encoding/decoding process". You will probably have noticed that there are two versions of .getBytes(): one where you pass a charset as an argument and the other where you don't.

If you don't, and that is what happens here, it means you will get the result of the encoding process using your default character set; and then you try and re-decode this byte sequence using UTF-8.

Don't do that. Just take in the string as it comes. If, however, you have trouble reading the original byte stream into a string, it means you use a Reader with the wrong charset. Fix that part.

For more information, read this link.

(1) the fact that, in fact, a char is a UTF-16 code unit is irrelevant to this discussion

UTF-8 Encoding ; Only some Japanese characters are not getting converted

Tags:

java

character-encoding

encoding

utf-8

utf

Janak

1 Answers

fge

Recent Activity

Donate For Us

UTF-8 Encoding ; Only some Japanese characters are not getting converted

Tags:

java

character-encoding

encoding

utf-8

utf

Janak

1 Answers

fge

Related questions

Recent Activity

Donate For Us