Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Japanese Character Encoding in Java

Tags:

java

unicode

cjk

Here's my problem. I'm now using using Java Apache POI to read an Excel (.xls or .xlsx) file, and display the contents. There are some Japanese chars in the spreadsheet and all of the Japanese chars I got are "???" in my output. I tried to use Shift-JIS, UTF-8 and many other encoding ways, but it doesn't work... Here's my encoding code below:

public String encoding(String str) throws UnsupportedEncodingException{
  String Encoding = "Shift_JIS";
  return this.changeCharset(str, Encoding);
}
public String changeCharset(String str, String newCharset) throws UnsupportedEncodingException {
  if (str != null) {
    byte[] bs = str.getBytes();
    return new String(bs, newCharset);
  }
  return null;
}

I am passing in every string I got to encoding(str). But when I print the return value, it's still something like "???" (Like below) but not Japanese characters (Hiragana, Katakana or Kanji).

title-jp=???

Anyone can help me with this? Thank you so much.

like image 711
Allan Jiang Avatar asked Oct 09 '22 19:10

Allan Jiang


1 Answers

Your changeCharset method seems strange. String objects in Java are best thought of as not have a specific character set. They use Unicode and so can represent all characters, not only one regional subset. Your method says: turn the string into bytes using my system's character set (whatever that may be), and then try and interpret those bytes using some other character set (specified in newCharset), which therefore probably won't work. If you convert to bytes in an encoding, you should read those bytes with the same encoding.

Update:

To convert a String to Shift-JIS (a regional encoding commonly used in Japan) you can say:

byte[] jis = str.getBytes("Shift_JIS");

If you write those bytes into a file, and then open the file in Notepad on a Windows computer where the regional settings are all Japan-centric, Notepad will display it in Japanese (having nothing else to go on, it will assume the text is in the system's local encoding).

However, you could equally well save it as UTF-8 (prefixed with the 3-byte UTF-8 introducer sequence) and Notepad will also display it as Japanese. Shift-JIS is only one way of representing Japanese text as bytes.

like image 51
Daniel Earwicker Avatar answered Oct 13 '22 11:10

Daniel Earwicker