Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java, XML DocumentBuilder - setting the encoding when parsing

I'm trying to save a tree (extends JTree) which holds an XML document to a DOM Object having changed it's structure.

I have created a new document object, traversed the tree to retrieve the contents successfully (including the original encoding of the XML document), and now have a ByteArrayInputStream which has the tree contents (XML document) with the correct encoding.

The problem is when I parse the ByteArrayInputStream the encoding is changed to UTF-8 (in the XML document) automatically.

Is there a way to prevent this and use the correct encoding as provided in the ByteArrayInputStream.

It's also worth adding that I have already used the
transformer.setOutputProperty(OutputKeys.ENCODING, encoding) method to retrieve the right encoding.

Any help would be appreciated.

like image 866
Ralph D Avatar asked Aug 26 '10 18:08

Ralph D


3 Answers

Here's an updated answer since OutputFormat is deprecated :

TransformerFactory tf = TransformerFactory.newInstance();
Transformer transformer = tf.newTransformer();
transformer.setOutputProperty(OutputKeys.ENCODING, "ISO-8859-1");

StringWriter writer = new StringWriter();
transformer.transform(new DOMSource(document), new StreamResult(writer));
String output = writer.getBuffer().toString().replaceAll("\n|\r", "");

The second part will return the XML Document as String

like image 76
Cyril N. Avatar answered Nov 18 '22 12:11

Cyril N.


// Read XML
String xml = "xml"
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document document = builder.parse(new InputSource(new StringReader(xml)));

// Append formatting
OutputFormat format = new OutputFormat(document);

if (document.getXmlEncoding() != null) {
  format.setEncoding(document.getXmlEncoding());
}

format.setLineWidth(100);
format.setIndenting(true);
format.setIndent(5);
Writer out = new StringWriter();
XMLSerializer serializer = new XMLSerializer(out, format);
serializer.serialize(document);
String result = out.toString();
like image 3
Andrey Avatar answered Nov 18 '22 11:11

Andrey


I solved it, given alot of trial and errors.

I was using

OutputFormat format = new OutputFormat(document);

but changed it to

OutputFormat format = new OutputFormat(d, encoding, true);

and this solved my problem.

encoding is what I set it to be
true refers to whether or not indent is set.

Note to self - read more carefully - I had looked at the javadoc hours ago - if only I'd have read more carefully.

like image 2
Ralph D Avatar answered Nov 18 '22 11:11

Ralph D