Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

non-validating DocumentBuilder trying to read DTD file

Tags:

java

xml

java-7

dtd

Why is the non-validating DocumentBuilder in the SSCCE below trying to read the DTD file?

public class FooMain  {

    private static String XML_INSTANCE = "<?xml version=\"1.0\"?>                        "+
                                         "<!DOCTYPE note SYSTEM \"does-not-exist.dtd\">  "+
                                         "<a/>                                           ";


    public static void main(String args[]) throws Exception {
        DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
        factory.setNamespaceAware(false);
        factory.setValidating(false); 
        DocumentBuilder builder = factory.newDocumentBuilder();

        InputStream is = new ByteArrayInputStream(XML_INSTANCE.getBytes("UTF-8"));
        Document doc = builder.parse(is);
    }
}

Code explodes with:

[java] Exception in thread "main" java.io.FileNotFoundException: /lhome/minimal-for-SO/does-not-exist.dtd (No such file or directory)
 [java]     at java.io.FileInputStream.open(Native Method)
 [java]     at java.io.FileInputStream.<init>(FileInputStream.java:146)
 [java]     at java.io.FileInputStream.<init>(FileInputStream.java:101)
 [java]     at sun.net.www.protocol.file.FileURLConnection.connect(FileURLConnection.java:90)
 [java]     at sun.net.www.protocol.file.FileURLConnection.getInputStream(FileURLConnection.java:188)
 [java]     at org.apache.xerces.impl.XMLEntityManager.setupCurrentEntity(Unknown Source)
 [java]     at org.apache.xerces.impl.XMLEntityManager.startEntity(Unknown Source)
 [java]     at org.apache.xerces.impl.XMLEntityManager.startDTDEntity(Unknown Source)
 [java]     at org.apache.xerces.impl.XMLDTDScannerImpl.setInputSource(Unknown Source)
 [java]     at org.apache.xerces.impl.XMLDocumentScannerImpl$DTDDispatcher.dispatch(Unknown Source)
 [java]     at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
 [java]     at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
 [java]     at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
 [java]     at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
 [java]     at org.apache.xerces.parsers.DOMParser.parse(Unknown Source)
 [java]     at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source)
 [java]     at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:121)
 [java]     at FooMain.main(FooMain.java:35)

Given that the builder is non-validating I would expect to at least not crash if the file's not found (if not skipping the search for the DTD file altogether). So what prevents the document from being parsed give that the builder is non-validating and so need not access the DTD?

like image 311
Marcus Junius Brutus Avatar asked Mar 20 '23 01:03

Marcus Junius Brutus


1 Answers

In order to ignore DTD instructions and references, you must set some more flags:

factory.setFeature("http://apache.org/xml/features/nonvalidating/load-dtd-grammar", false);
factory.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);

If you are building web application, I suggest you to globally dissable resolving DTD entities, because it's potential security vuilnerable.

For example:

<?xml version="1.0" encoding="ISO-8859-1"?>
 <!DOCTYPE foo [  
  <!ELEMENT foo ANY >
   <!ENTITY xxe SYSTEM "file:///dev/random" >]><foo>&xxe;</foo>

will cause your server to crash, while trying to insert content from /dev/random into &xxe.

like image 96
MGorgon Avatar answered Apr 02 '23 19:04

MGorgon