Platform's default charset on different platforms?

Tags:

Some legacy code relies on the platform's default charset for translations. For Windows and Linux installations in the "western world" I know what that means. But thinking about Russian or Asian platforms I am totally unsure what their platform's default charset is (just UTF-16?).

Therefore I would like to know what I would get when executing the following code line:

System.out.println("Default Charset=" + Charset.defaultCharset());

PS:

I don't want to discuss the problems of charsets and their difference to Unicode here. I just want to collect what operating systems will result in what specific charset. Please post only concrete values!

776

asked Feb 16 '12 14:02

Robert

2 Answers

That's a user specific setting. On many modern Linux systems, it's UTF-8. On Macs, it’s MacRoman. In the US on Windows, it's often CP1250, in Europe it's CP1252. In China, you often find simplified chinese (Big5 or a GB*).

But that’s the system default, which each user can change at any time. Which is probably the solution: Set the encoding when you start your app using the system property file.encoding

See this answer how to do that. I suggest to put this into a small script which starts your app, so the user default isn't tainted.

164

answered Sep 21 '22 00:09

Aaron Digulla

For Windows and Linux installations in the "western world" I know what that means.

Probably not as well as you think.

But thinking about Russian or Asian platforms I am totally unsure what their platform's default charset is

Usually it's whatever encoding is historically used in their country.

(just UTF-16?).

Most definitely not. Computer usage spread widely before the Unicode standard existed, and each language area developed one or more encodings that could support its language. Those who needed less than 128 characters outside ASCII typically developed an "extended ASCII", many of which were eventually standardized as ISO-8859, while others developed two-byte encodings, often several competing ones. For example, in Japan, emails typically use JIS, but webpages use Shift-JIS, and some applications use EUC-JP. Any of these might be encountered as the platform default encoding in Java.

It's all a huge mess, which is exactly why Unicode was developed. But the mess has not yet disappeared and we still have to deal with it and should not make any assumptions about what encoding a given bunch of bytes to be interpreted as text are in. There Ain't No Such Thing as Plain Text.

answered Sep 19 '22 00:09

Michael Borgwardt

Related questions
                            
                                How do I schedule a task to run once?
                            
                                Scan components of different maven modules/JARs in a Spring Boot application
                            
                                Why does Java 8 ZonedDateTime think 24:01 is a valid time string representation?
                            
                                Where can I find a syntax highlighting library for Java? [closed]
                            
                                Why do finalizers have a "severe performance penalty"?
                            
                                java how to use classes in other package?
                            
                                Java code for wrapping text lines to a max line width
                            
                                Tool or tricks to analyze offline Java heap dumps (.hprof)
                            
                                Most elegant way to convert a byte to an int in Java
                            
                                Calling a subclass method from superclass
                            
                                Understanding JVM Memory Allocation and Java Out of Memory: Heap Space
                            
                                Why is it not allowed add toString() to interface as default method? [duplicate]
                            
                                Java 8 stream short-circuit
                            
                                Why is an integer array search loop slower in C++ than Java?
                            
                                Running ProGuard on OS X: Where is Apple's equivalent to the rt.jar?
                            
                                Convert java.util.List<String> into java.sql.Array
                            
                                When should we close the EntityManagerFactory?
                            
                                In Java what is the performance of AtomicInteger compareAndSet() versus synchronized keyword?
                            
                                Servlet mapping: url-pattern for URLs with trailing slash
                            
                                how to handle 2000+ requests/sec on tomcat?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Platform's default charset on different platforms?

Tags:

java

character-encoding

platform

Robert

People also ask

2 Answers

Aaron Digulla

Michael Borgwardt

Recent Activity

Donate For Us