Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java convert Windows-1252 to UTF-8, some letters are wrong

I receive data from a external Microsoft SQL 2008 database (I make queries with MyBatis). The data is encoded as "Windows-1252".

I have tried to re-encode to UTF-8:

String textoFormado = ...value from MyBatis... ; 
String s = new String(textoFormado.getBytes("Windows-1252"), "UTF-8");

Almost the whole string is correctly decoded, but some letters with accents are not.

For example:

  1. I received this: �vila
  2. The code above makes: �?vila
  3. I expected: Ávila
like image 859
Ramon Avatar asked Apr 15 '14 11:04

Ramon


2 Answers

Why not tackling the issue at a lower level: reading the String in proper encoding from your database.

Most JDBC connection-string or URIs support the property characterEncoding.

So in you Microsoft SQL Server case you could have for example jdbc:sqlserver://localhost:52865;databaseName=myDb?characterEncoding=utf8.

Then each String column should be read in the specified encoding without the need to (re-)convert it manually to it.

See also:

  • JDBC character encoding
  • Problems reading/writing UTF-8 data in MySQL from Java using JDBC connector 5.1
like image 44
hc_dev Avatar answered Sep 21 '22 11:09

hc_dev


Obviously, textoFormado is a variable of type String. This means that the bytes were already decoded. Java then internally uses a 16-bit Unicode representation. What you did, is to encode your string with Windows-1252 followed by reading the resulting bytes with an UTF-8 encoding. That does not work.

What you need is the correct encoding when reading the bytes:

byte[] sourceBytes = getRawBytes();
String data = new String(sourceBytes , "Windows-1252");

For using this string inside your program, you do not need to do anything. Simply use it. If - however - you want to write the data back to a file for example, you need to encode again:

byte[] destinationBytes = data.getBytes("UTF-8");
// write bytes to destination file here
like image 191
Seelenvirtuose Avatar answered Sep 19 '22 11:09

Seelenvirtuose