Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I convert a String in ASCII(Unicode Escaped) to Unicode(UTF-8) if I am reading from a file?

EDIT:

I am reading that string from a file, so this topic is actually about the following question:

I have this string which is the equal() to the one received from the file:

"Diogo Pi\\u00e7arra - Tu E Eu"

How can I make Java read the resulting string "\u00e7" as a "ç" character?

This happens because the file is not encoded in UTF-8 but in escaped Unicode, hence the reason why I am reading "\u00e7" as a string and not a Unicode character. So I need a function that parses this at runtime. I could iterate over .replace() functions to parse this but......


Old Question (asked in the wrong way before I understand what was going on, please ignore the following text):

I have the following string:

final String str = "Diogo Pi\u00e7arra - Tu E Eu";

and I want to convert it to:

"Diogo Piçarra - Tu E Eu"

I have tried everything, from Apache Lang tools unescape function, to

new String(str.getBytes("UTF-16"), "UTF-16")

or

new String(str.getBytes("UTF-8"), "UTF-8")

or

new String(str.getBytes("UTF-16"))

or

new String(str.getBytes("UTF-8"))

But nothing works...!

What can I try next?

Thanks!

like image 451
PedroD Avatar asked Dec 10 '25 10:12

PedroD


1 Answers

The way I got it working for me, reading from a file with escaped unicode explicitly written:

    BufferedReader reader1 = new BufferedReader(new InputStreamReader(file.getInputStream()));
    byte c;
    while ((c = (byte) reader1.read()) != -1) {
        output.append(new String(new byte[] { c }, "UTF-8"));
    }
    return StringEscapeUtils.unescapeJava(output.toString());

This works because

StringEscapeUtils.unescapeJava("Diogo Pi\\u00e7arra - Tu E Eu")
results in "Diogo Piçarra - Tu E Eu"
like image 125
PedroD Avatar answered Dec 13 '25 00:12

PedroD



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!