Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java charset and Windows

Tags:

java

windows

I have a Java program that runs msinfo32.exe (system information)in an external process and then reads the file content produced by msinfo32.exe. When the Java program loads the file content into a String, the String characters are unreadable. For the String to be readable I have to create the String using String(byte[] bytes, String charsetName) and set charsetName to UTF-16. However when running on one instance of Windows2003, only UTF-16LE (little endian) results in a printable string.

How can I know ahead of time which character encoding to use?

Also, any background information on this topic would be appreciated.

like image 427
Mike Avatar asked Mar 01 '23 01:03

Mike


2 Answers

Some Microsoft applications use a byte-order mark to indicate Unicode files and their endianness. I can see on my Windows XP machine that the exported .NFO file starts with 0xFFFE, so it is little-endian.

FF FE 3C 00 3F 00 78 00 6D 00 6C 00 20 00 76 00         __<_?_x_m_l_ _v_
65 00 72 00 73 00 69 00 6F 00 6E 00 3D 00 22 00         e_r_s_i_o_n_=_"_
31 00 2E 00 30 00 22 00 3F 00 3E 00 0D 00 0A 00         1_._0_"_?_>_____
3C 00 4D 00 73 00 49 00 6E 00 66 00 6F 00 3E 00         <_M_s_I_n_f_o_>_
0D 00 0A 00 3C 00 4D 00 65 00 74 00 61 00 64 00         ____<_M_e_t_a_d_

Also, I recommend you switch to using Reader implementations rather than the String constructor for decoding files; this helps avoid problems where you read half a character because it is truncated because it is sitting at the end of a byte array.

like image 107
McDowell Avatar answered Mar 11 '23 04:03

McDowell


You could try to use a library to guess the encoding, for instance I have once used this solution.

like image 33
Fabian Steeg Avatar answered Mar 11 '23 04:03

Fabian Steeg