Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Invalid byte 2 of 4-byte UTF-8 sequence, but only when executing JAR?

I have this java program where I transform with TransformerFactory a XML string that I get from a SQL Server database and write it to a file, and then use this file to generate a PDF.

The thing is that it works fine when I execute it with netbeans, but if I execute the jar in the project dist folder I get a "Invalid byte 2 of 4-byte UTF-8 sequence".

After changing the encoding of the XML string to UTF-8 now it works fine from the jar too.

So my question is, why would it work when running the project in NetBeans but not from the JAR file before changing the encoding?

Have tried this only in Windows.

Code:

Here is the SQL Server query (original):

SQLXML xml = null;
String xmlString = "";
while (rs.next()){
    xml = rs.getSQLXML(1);
    xmlString = xml.getString();
}
return xmlString;

...and modified:

SQLXML xml = null;
String xmlString = "";
while (rs.next()){
    xml = rs.getSQLXML(1);
    // Note explicit UTF-8 encoding specified
    xmlString = new String(xml.getString().getBytes(),"UTF8");
 }
 return xmlString;

And here the transformation:

public static void serialize(Document doc, OutputStream out) throws Exception {
    TransformerFactory tfactory = TransformerFactory.newInstance();
    try {
        Transformer serializer = tfactory.newTransformer();
        serializer.setOutputProperty("indent", "yes");
        serializer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "2");
        serializer.transform(new DOMSource(doc), new StreamResult(out));
    } catch (TransformerException e) {
        e.printStackTrace();
        throw new RuntimeException(e);
    }
}
like image 515
Daniel Montes de Oca Avatar asked Nov 10 '11 01:11

Daniel Montes de Oca


1 Answers

I've tried a simple Application in Netbeans that displays the Charset.defaultCharset(), and it returns "UTF-8". The same one in Eclipse returns "MacRoman". I'm on a Mac, on Windows it'd return "cp-1252".

So yes, when you run an Application in Netbeans, it defaults to UTF-8 encoding, that's why you didn't have any issues when parsing the XML.

like image 110
Luciano Avatar answered Nov 03 '22 05:11

Luciano