Invalid byte 2 of 4-byte UTF-8 sequence, but only when executing JAR?

Question

I have this java program where I transform with TransformerFactory a XML string that I get from a SQL Server database and write it to a file, and then use this file to generate a PDF.

The thing is that it works fine when I execute it with netbeans, but if I execute the jar in the project dist folder I get a "Invalid byte 2 of 4-byte UTF-8 sequence".

After changing the encoding of the XML string to UTF-8 now it works fine from the jar too.

So my question is, why would it work when running the project in NetBeans but not from the JAR file before changing the encoding?

Have tried this only in Windows.

Code:

Here is the SQL Server query (original):

SQLXML xml = null;
String xmlString = "";
while (rs.next()){
    xml = rs.getSQLXML(1);
    xmlString = xml.getString();
}
return xmlString;

...and modified:

SQLXML xml = null;
String xmlString = "";
while (rs.next()){
    xml = rs.getSQLXML(1);
    // Note explicit UTF-8 encoding specified
    xmlString = new String(xml.getString().getBytes(),"UTF8");
 }
 return xmlString;

And here the transformation:

public static void serialize(Document doc, OutputStream out) throws Exception {
    TransformerFactory tfactory = TransformerFactory.newInstance();
    try {
        Transformer serializer = tfactory.newTransformer();
        serializer.setOutputProperty("indent", "yes");
        serializer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "2");
        serializer.transform(new DOMSource(doc), new StreamResult(out));
    } catch (TransformerException e) {
        e.printStackTrace();
        throw new RuntimeException(e);
    }
}

Luciano · Accepted Answer

I've tried a simple Application in Netbeans that displays the Charset.defaultCharset(), and it returns "UTF-8". The same one in Eclipse returns "MacRoman". I'm on a Mac, on Windows it'd return "cp-1252".

So yes, when you run an Application in Netbeans, it defaults to UTF-8 encoding, that's why you didn't have any issues when parsing the XML.

Invalid byte 2 of 4-byte UTF-8 sequence, but only when executing JAR?

Tags:

java

windows

encoding

utf-8

xml-serialization

Daniel Montes de Oca

1 Answers

Luciano

Recent Activity

Donate For Us

Invalid byte 2 of 4-byte UTF-8 sequence, but only when executing JAR?

Tags:

java

windows

encoding

utf-8

xml-serialization

Daniel Montes de Oca

1 Answers

Luciano

Related questions

Recent Activity

Donate For Us