I have read that its bad idea to use platform default character encoding for example when reading a text file and importing the text into arrays etc. Could you explain how that could affect cross platform performance , and how to get past that problem ? Is there an encoding that should be used for cross-platform applications ? Thanks
It's not about performance, but about showing and reading properly encoded text. There are a number of ways to cope with the problem:
-Dfile.encoding=utf-8
String
, Reader
, Writer
and more.I think the latter is a must. If you always set the jvm option, it will work, but if you forget to set it at some point, there will be unexpected failures at random places.
And the other question - stick to UTF-8.
See also this question.
Usually its no problem, if the read and written files are not exchanged between platforms. But if you have e.g. a configuration file created on windows (Win1252, similar to ISO8859-1 encoding), and then start your app on a recent linux (UTF-8 encoding), the config file will have problems with nearly all chars above 127 (like german Umlauts ä, ö, ü, or the € sign, or similar characters).
In this case just specify that you always use either encoding, and stick with it. If you only use plain ASCII (non latin extended!) files, you won't have problems so far.
The default encoding varies from OS to OS and even between users on the same machine in the case of some multilingual installs. This means that character data written by the application will vary and not be readable/appear corrupt if read using a different default encoding. The Euro character (€) will encode as the bytes 80
under windows-1252, A4
under ISO-8859-15 and E2 82 AC
under UTF-8.
Legacy encodings can cause data loss since many of them only support a narrow range of code points.
The only supported way to change the default encoding is to change it in the operating system.
It is generally better to be explicit in choosing encodings and prefer a lossless Unicode encoding (usually UTF-8.) The decision to make "ANSI" encodings the default on Windows, for example, made more sense when when supporting Windows 95.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With