Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I load an org.w3c.dom.Document from XML in a string?

I have a complete XML document in a string and would like a Document object. Google turns up all sorts of garbage. What is the simplest solution? (In Java 1.5)

Solution Thanks to Matt McMinn, I have settled on this implementation. It has the right level of input flexibility and exception granularity for me. (It's good to know if the error came from malformed XML - SAXException - or just bad IO - IOException.)

public static org.w3c.dom.Document loadXMLFrom(String xml)     throws org.xml.sax.SAXException, java.io.IOException {     return loadXMLFrom(new java.io.ByteArrayInputStream(xml.getBytes())); }  public static org.w3c.dom.Document loadXMLFrom(java.io.InputStream is)      throws org.xml.sax.SAXException, java.io.IOException {     javax.xml.parsers.DocumentBuilderFactory factory =         javax.xml.parsers.DocumentBuilderFactory.newInstance();     factory.setNamespaceAware(true);     javax.xml.parsers.DocumentBuilder builder = null;     try {         builder = factory.newDocumentBuilder();     }     catch (javax.xml.parsers.ParserConfigurationException ex) {     }       org.w3c.dom.Document doc = builder.parse(is);     is.close();     return doc; } 
like image 273
Frank Krueger Avatar asked Aug 28 '08 20:08

Frank Krueger


People also ask

How do I convert a file to string in Java?

Document convertStringToDocument(String xmlStr) : This method will take input as String and then convert it to DOM Document and return it. We will use InputSource and StringReader for this conversion. String convertDocumentToString(Document doc) : This method will take input as Document and convert it to String.

What is org w3c DOM document?

Package org. w3c. dom Description. Provides the interfaces for the Document Object Model (DOM) which is a component API of the Java API for XML Processing. The Document Object Model Level 2 Core API allows programs to dynamically access and update the content and structure of documents.


1 Answers

Whoa there!

There's a potentially serious problem with this code, because it ignores the character encoding specified in the String (which is UTF-8 by default). When you call String.getBytes() the platform default encoding is used to encode Unicode characters to bytes. So, the parser may think it's getting UTF-8 data when in fact it's getting EBCDIC or something… not pretty!

Instead, use the parse method that takes an InputSource, which can be constructed with a Reader, like this:

import java.io.StringReader; import org.xml.sax.InputSource; …         return builder.parse(new InputSource(new StringReader(xml))); 

It may not seem like a big deal, but ignorance of character encoding issues leads to insidious code rot akin to y2k.

like image 90
erickson Avatar answered Sep 19 '22 18:09

erickson