Assuming this input XML
<?xml version="1.0" encoding="UTF-16"?>
<test></test>
Writing these lines of code :
StreamSource source = new StreamSource(new StringReader(/* the above XML*/));
StringWriter stringWriter = new StringWriter();
StreamResult streamResult = new StreamResult(stringWriter);
TransformerFactory.newInstance().newTransformer().transform(source, streamResult);
return stringWriter.getBuffer().toString();
Outputs for me this XML:
<?xml version="1.0" encoding="UTF-8"?>
<test></test>
(the declared encoding of UTF-16 is converted to the default UTF-8)
I know I can explicitly ask for UTF-16 output
transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-16");
But the question is, how to make the output encoding automatically be the same as the input?
To do this, you'll have to use something more sophisticated than a StreamSource
. For example, a StAXSource
takes an XMLStreamReader
, which has the getCharacterEncodingScheme()
method that tells you which encoding the input document used - you can the set that as output enocding.
try this:
// Create an XML Stream Reader
XMLStreamReader xmlSR = XMLInputFactory.newInstance()
.createXMLStreamReader(new StringReader(/* the above XML*/));
// Wrap the XML Stream Reader in a StAXSource
StAXSource source = new StAXSource(xmlSR);
// Create a String Writer
StringWriter stringWriter = new StringWriter();
// Create a Stream Result
StreamResult streamResult = new StreamResult(stringWriter);
// Create a transformer
Transformer transformer = TransformerFactory.newInstance().newTransformer();
// Set STANDALONE based on the source stream
transformer.setOutputProperty(OutputKeys.STANDALONE,
xmlSR.isStandalone() ? "yes" : "no");
// Set ENCODING based on the source stream
transformer.setOutputProperty(OutputKeys.ENCODING,
xmlSR.getCharacterEncodingScheme());
// Set VERSION based on the source stream
transformer.setOutputProperty(OutputKeys.VERSION, xmlSR.getVersion());
// Transform the source stream to the out stream
transformer.transform(source, streamResult);
// Print the results
return stringWriter.getBuffer().toString();
You need to peek into the stream first. Section F of the XML specification gives you an idea how to auto-detect the encoding.
The XSLT processor doesn't actually know what the input encoding is (the XML parser doesn't tell it, because it doesn't need to know). You can set the output encoding using xsl:output, but to make this the same as the input encoding you're going to have to discover the input encoding first, for example by peeking at the source file before parsing it.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With