Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Platform's default charset on different platforms?

Some legacy code relies on the platform's default charset for translations. For Windows and Linux installations in the "western world" I know what that means. But thinking about Russian or Asian platforms I am totally unsure what their platform's default charset is (just UTF-16?).

Therefore I would like to know what I would get when executing the following code line:

System.out.println("Default Charset=" + Charset.defaultCharset());

PS:

I don't want to discuss the problems of charsets and their difference to Unicode here. I just want to collect what operating systems will result in what specific charset. Please post only concrete values!

like image 776
Robert Avatar asked Feb 16 '12 14:02

Robert


People also ask

What is your platform's default character encoding?

On many modern Linux systems, it's UTF-8. On Macs, it's MacRoman. In the US on Windows, it's often CP1250, in Europe it's CP1252.

What is the default charset for Java?

The native character encoding of the Java programming language is UTF-16. A charset in the Java platform therefore defines a mapping between sequences of sixteen-bit UTF-16 code units (that is, sequences of chars) and sequences of bytes.

What is the default character encoding of the Android system?

The default character encoding for Android is UTF-8, as specified by the JavaDoc of the Charset.


2 Answers

That's a user specific setting. On many modern Linux systems, it's UTF-8. On Macs, it’s MacRoman. In the US on Windows, it's often CP1250, in Europe it's CP1252. In China, you often find simplified chinese (Big5 or a GB*).

But that’s the system default, which each user can change at any time. Which is probably the solution: Set the encoding when you start your app using the system property file.encoding

See this answer how to do that. I suggest to put this into a small script which starts your app, so the user default isn't tainted.

like image 164
Aaron Digulla Avatar answered Sep 21 '22 00:09

Aaron Digulla


For Windows and Linux installations in the "western world" I know what that means.

Probably not as well as you think.

But thinking about Russian or Asian platforms I am totally unsure what their platform's default charset is

Usually it's whatever encoding is historically used in their country.

(just UTF-16?).

Most definitely not. Computer usage spread widely before the Unicode standard existed, and each language area developed one or more encodings that could support its language. Those who needed less than 128 characters outside ASCII typically developed an "extended ASCII", many of which were eventually standardized as ISO-8859, while others developed two-byte encodings, often several competing ones. For example, in Japan, emails typically use JIS, but webpages use Shift-JIS, and some applications use EUC-JP. Any of these might be encountered as the platform default encoding in Java.

It's all a huge mess, which is exactly why Unicode was developed. But the mess has not yet disappeared and we still have to deal with it and should not make any assumptions about what encoding a given bunch of bytes to be interpreted as text are in. There Ain't No Such Thing as Plain Text.

like image 20
Michael Borgwardt Avatar answered Sep 19 '22 00:09

Michael Borgwardt