Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Convert ANSI characters to UTF-8 in Java

Is there a way to convert an ANSI string to UTF using Java.

I have a custom serializer that uses readUTF & writeUTF methods of the DataInputStream class to deserialize and serialze string. If i receive a string encoded in ANSI and is too long, ~100000 chars long i get the error;

Caused by: java.io.UTFDataFormatException: encoded string too long: 106958 bytes

However in my Junit tests i'm able create a string with 120000 'a's and it works perfectly

I have checked the following posts but still having errors;

  • Converting UTF-8 to ISO-8859-1 in Java - how to keep it as single byte
  • How do I replace accented Latin characters in Ruby?
like image 940
n002213f Avatar asked Dec 18 '22 05:12

n002213f


1 Answers

This error is not caused by character encoding. It means the length of the UTF data is wrong.

EDIT: Just realized this is a writing error, not reading error.

The UTF length is only 2 bytes so it can only hold 64K UTF-8 bytes. You are trying to writing 100K, it's not going to work.

This limit is hardcoded and no way to get around this,

if (utflen > 65535)
    throw new UTFDataFormatException(
            "encoded string too long: " + utflen + " bytes");
like image 101
ZZ Coder Avatar answered Jan 09 '23 09:01

ZZ Coder