Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

java.io.UnsupportedEncodingException: cp932?

What type of content would cause this exception?

Caused by: java.io.UnsupportedEncodingException: cp932
        at sun.nio.cs.StreamDecoder.forInputStreamReader(StreamDecoder.java:71)
        at java.io.InputStreamReader.<init>(InputStreamReader.java:100)
        at com.google.code.com.sun.mail.handlers.text_plain.getContent(text_plain.java:109)
        at com.google.code.javax.activation.DataSourceDataContentHandler.getContent(DataHandler.java:803)
        at com.google.code.javax.activation.DataHandler.getContent(DataHandler.java:550)
        at com.google.code.javax.mail.internet.MimeBodyPart.getContent(MimeBodyPart.java:639)

And why can't OpenJDK handle this encoding?

like image 759
Saqib Ali Avatar asked Feb 15 '23 13:02

Saqib Ali


1 Answers

Any text or text-based content that uses that character set / encoding!

According to Wikipedia, CP932 is an extension of Shift JIS ... which is one of the character sets that is used to represent Japanese text.


According to this page, CP932 is in the "Extended Encoding Set (contained in lib/charsets.jar)". If it is not in your install of OpenJDK, look for a yum / apt / whatever OpenJDK package that offers extra Java character set support. Support for CP932 in OpenJDK is definitely available somewhere ...

It is also possible (though IMO unlikely) that OpenJDK doesn't recognize "cp932" as an alias for what it refers to as "MS932" and "windows-31j".


I checked the code.

The issue is that Java (not just OpenJDK!) does not recognize the "cp932" alias at all. The reason it doesn't recognize it is that the alias is non-standard.

The official (IANA endorsed) name for this encoding is "windows-31j", and Java also supports the following aliases by default:

  • "MS932"
  • "windows-932"
  • "csWindows31J"

If you set the "sun.nio.cs.map" system property (i.e. using "-D...") to "Windows-31J/Shift_JIS", then Java will also recognize "shift-jis", "ms_kanji", "x-sjis", and "csShiftJIS" as being equivalent ... but this should only be used for backwards compatibility with old (1.4.0 and earlier) JDKs that didn't implement the real SHIFT-JIS encoding correctly. (Besides, this doesn't solve your problem ...)

So what can you do?

  • Reject / discard the content as invalid. (And it is.)
  • Find out where this content is coming from, and get them to fix the incorrect encoding name.
  • Intercept the encoding name in the Google code before it tries to use it, and replace the non-standard name with an appropriate standard one.
  • Use nasty reflective hackery to add an encoding alias to the private data structure that the Oracle code is using to lookup encodings. (Warning: this may make your application fragile, and lead to portability problems.)
  • Raise an RFE against the Java SE requesting an easy way to add aliases for character encodings. (This is a really long term solution, though you may be able to accelerate it by writing and submitting the proposed enhancement to the OpenJDK team as a patch.)
like image 195
Stephen C Avatar answered Feb 18 '23 03:02

Stephen C