I have a Java program that runs msinfo32.exe (system information)in an external process and then reads the file content produced by msinfo32.exe. When the Java program loads the file content into a String, the String characters are unreadable. For the String to be readable I have to create the String using String(byte[] bytes, String charsetName) and set charsetName to UTF-16. However when running on one instance of Windows2003, only UTF-16LE (little endian) results in a printable string.
How can I know ahead of time which character encoding to use?
Also, any background information on this topic would be appreciated.
Some Microsoft applications use a byte-order mark to indicate Unicode files and their endianness. I can see on my Windows XP machine that the exported .NFO file starts with 0xFFFE, so it is little-endian.
FF FE 3C 00 3F 00 78 00 6D 00 6C 00 20 00 76 00 __<_?_x_m_l_ _v_
65 00 72 00 73 00 69 00 6F 00 6E 00 3D 00 22 00 e_r_s_i_o_n_=_"_
31 00 2E 00 30 00 22 00 3F 00 3E 00 0D 00 0A 00 1_._0_"_?_>_____
3C 00 4D 00 73 00 49 00 6E 00 66 00 6F 00 3E 00 <_M_s_I_n_f_o_>_
0D 00 0A 00 3C 00 4D 00 65 00 74 00 61 00 64 00 ____<_M_e_t_a_d_
Also, I recommend you switch to using Reader implementations rather than the String constructor for decoding files; this helps avoid problems where you read half a character because it is truncated because it is sitting at the end of a byte array.
You could try to use a library to guess the encoding, for instance I have once used this solution.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With