Why is the non-validating DocumentBuilder in the SSCCE below trying to read the DTD file?
public class FooMain {
private static String XML_INSTANCE = "<?xml version=\"1.0\"?> "+
"<!DOCTYPE note SYSTEM \"does-not-exist.dtd\"> "+
"<a/> ";
public static void main(String args[]) throws Exception {
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setNamespaceAware(false);
factory.setValidating(false);
DocumentBuilder builder = factory.newDocumentBuilder();
InputStream is = new ByteArrayInputStream(XML_INSTANCE.getBytes("UTF-8"));
Document doc = builder.parse(is);
}
}
Code explodes with:
[java] Exception in thread "main" java.io.FileNotFoundException: /lhome/minimal-for-SO/does-not-exist.dtd (No such file or directory)
[java] at java.io.FileInputStream.open(Native Method)
[java] at java.io.FileInputStream.<init>(FileInputStream.java:146)
[java] at java.io.FileInputStream.<init>(FileInputStream.java:101)
[java] at sun.net.www.protocol.file.FileURLConnection.connect(FileURLConnection.java:90)
[java] at sun.net.www.protocol.file.FileURLConnection.getInputStream(FileURLConnection.java:188)
[java] at org.apache.xerces.impl.XMLEntityManager.setupCurrentEntity(Unknown Source)
[java] at org.apache.xerces.impl.XMLEntityManager.startEntity(Unknown Source)
[java] at org.apache.xerces.impl.XMLEntityManager.startDTDEntity(Unknown Source)
[java] at org.apache.xerces.impl.XMLDTDScannerImpl.setInputSource(Unknown Source)
[java] at org.apache.xerces.impl.XMLDocumentScannerImpl$DTDDispatcher.dispatch(Unknown Source)
[java] at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
[java] at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
[java] at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
[java] at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
[java] at org.apache.xerces.parsers.DOMParser.parse(Unknown Source)
[java] at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source)
[java] at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:121)
[java] at FooMain.main(FooMain.java:35)
Given that the builder is non-validating I would expect to at least not crash if the file's not found (if not skipping the search for the DTD file altogether). So what prevents the document from being parsed give that the builder is non-validating and so need not access the DTD?
In order to ignore DTD instructions and references, you must set some more flags:
factory.setFeature("http://apache.org/xml/features/nonvalidating/load-dtd-grammar", false);
factory.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);
If you are building web application, I suggest you to globally dissable resolving DTD entities, because it's potential security vuilnerable.
For example:
<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE foo [
<!ELEMENT foo ANY >
<!ENTITY xxe SYSTEM "file:///dev/random" >]><foo>&xxe;</foo>
will cause your server to crash, while trying to insert content from /dev/random into &xxe.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With