Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to Preserve the Input's Declared Encoding in the Output of javax.xml.transform.Transformer.transform? (e.g. avoid UTF-16 changing to UTF-8)

Assuming this input XML

<?xml version="1.0" encoding="UTF-16"?>
<test></test>

Writing these lines of code :

StreamSource source = new StreamSource(new StringReader(/* the above XML*/));
StringWriter stringWriter = new StringWriter();
StreamResult streamResult = new StreamResult(stringWriter);
TransformerFactory.newInstance().newTransformer().transform(source, streamResult);
return stringWriter.getBuffer().toString();

Outputs for me this XML:

<?xml version="1.0" encoding="UTF-8"?>
<test></test>

(the declared encoding of UTF-16 is converted to the default UTF-8)

I know I can explicitly ask for UTF-16 output

transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-16");

But the question is, how to make the output encoding automatically be the same as the input?

like image 759
Eran Medan Avatar asked Jan 31 '11 20:01

Eran Medan


4 Answers

To do this, you'll have to use something more sophisticated than a StreamSource. For example, a StAXSource takes an XMLStreamReader, which has the getCharacterEncodingScheme() method that tells you which encoding the input document used - you can the set that as output enocding.

like image 175
Michael Borgwardt Avatar answered Nov 03 '22 18:11

Michael Borgwardt


try this:

// Create an XML Stream Reader
XMLStreamReader xmlSR = XMLInputFactory.newInstance()
        .createXMLStreamReader(new StringReader(/* the above XML*/));
// Wrap the XML Stream Reader in a StAXSource
StAXSource source = new StAXSource(xmlSR);
// Create a String Writer
StringWriter stringWriter = new StringWriter();
// Create a Stream Result
StreamResult streamResult = new StreamResult(stringWriter);
// Create a transformer
Transformer transformer = TransformerFactory.newInstance().newTransformer();
// Set STANDALONE based on the source stream
transformer.setOutputProperty(OutputKeys.STANDALONE,
        xmlSR.isStandalone() ? "yes" : "no");
// Set ENCODING based on the source stream
transformer.setOutputProperty(OutputKeys.ENCODING,
        xmlSR.getCharacterEncodingScheme());
// Set VERSION based on the source stream
transformer.setOutputProperty(OutputKeys.VERSION, xmlSR.getVersion());
// Transform the source stream to the out stream
transformer.transform(source, streamResult);
// Print the results
return stringWriter.getBuffer().toString();
like image 26
tecfield Avatar answered Nov 03 '22 19:11

tecfield


You need to peek into the stream first. Section F of the XML specification gives you an idea how to auto-detect the encoding.

like image 1
Jochen Bedersdorfer Avatar answered Nov 03 '22 19:11

Jochen Bedersdorfer


The XSLT processor doesn't actually know what the input encoding is (the XML parser doesn't tell it, because it doesn't need to know). You can set the output encoding using xsl:output, but to make this the same as the input encoding you're going to have to discover the input encoding first, for example by peeking at the source file before parsing it.

like image 1
Michael Kay Avatar answered Nov 03 '22 19:11

Michael Kay