In my application I'm getting the user info from LDAP and sometimes the full username comes in a wrong charset. For example:
ТеÑÑ61 ТеÑÑовиÑ61
It can also be in English or in Russian and displayed correctly. If the username changes it's updated in database. Even if I change the value in the db it wont solve the problem.
I can fix it before saving by doing this
new String(incorrect.getBytes("ISO-8859-1"), "UTF-8");
However, if I will use it for the string including characters in Russian (for ex., "Тест61 Тестович61") I get something like this "????61 ????????61".
Can you please suggest something that can determine the charset of string?
To find out what character set or collation a string has, use the CHARSET() or COLLATION() function.
To get the name of the character set, which can be used as an encoding name in Java, you use the getName() method: CharsetMatch match = ...; byte characterData[] = ...; String charsetName; String unicodeData; charsetName = match. getName(); unicodeData = new String(characterData, charsetName);
The native character encoding of the Java programming language is UTF-16. A charset in the Java platform therefore defines a mapping between sequences of sixteen-bit UTF-16 code units (that is, sequences of chars) and sequences of bytes.
Valid UTF8 has a specific binary format. If it's a single byte UTF8 character, then it is always of form '0xxxxxxx', where 'x' is any binary digit. If it's a two byte UTF8 character, then it's always of form '110xxxxx10xxxxxx'.
Strings in java, AFAIK, do not retain their original encoding - they are always stored internally in some Unicode form. You want to detect the charset of the original stream/bytes - this is why I think your String.toBytes() call is too late.
Ideally if you could get the input stream you are reading from, you can run it through something like this: http://code.google.com/p/juniversalchardet/
There are plenty of other charset detectors out there as well
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With