Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how can I tell xalan NOT to validate XML retrieved using the "document" function?

Yesterday Oracle decided to take down java.sun.com for a while. This screwed things up for me because xalan tried to validate some XML but couldn't retrieve the properties.dtd.

I'm using xalan 2.7.1 to run some XSL transforms, and I don't want it to validate anything. so tried loading up the XSL like this:

SAXParserFactory spf = SAXParserFactory.newInstance();
spf.setNamespaceAware(true);
spf.setValidating(false);
XMLReader rdr = spf.newSAXParser().getXMLReader();      
Source xsl = new SAXSource(rdr, new InputSource(xslFilePath));  
Templates cachedXSLT  = factory.newTemplates(xsl);
Transformer transformer = cachedXSLT.newTransformer();         
transformer.transform(xmlSource, result);  

in the XSL itself, I do something like this:

  <xsl:variable name="entry" select="document(concat($prefix, $locale_part, $suffix))/properties/entry[@key=$key]"/>

The XML this code retrieves has the following definition at the top:

<!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd">
<properties>
<entry key="...

Despite the java code above instructing the parser to NOT VALIDATE, it still sends a request to java.sun.com. While java.sun.com is unavailable, this makes the transform fail with the message:

 Can not load requested doc: http://java.sun.com/dtd/properties.dtd

How do I get xalan to stop trying to validate the XML loaded from the "document" function?

like image 618
nont Avatar asked Jun 30 '11 17:06

nont


2 Answers

The documentation mentions that the parser may read the DTDs even if not validating, as it may become necessary to use the DTD to resolve (expand) entities.

Since I don't have control over the XML documents, nont's option of modifying the XML was not available to me.

I managed to shut down attempts to pull in DTD documents by sabotaging the resolver, as follows.

My code uses a DocumentBuilder to return a Document (= DOM) but the XMLReader as per the OP's example also has a method setEntityResolver so the same technique should work with that.

DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setValidating(false); // turns off validation
factory.setSchema(null);      // turns off use of schema
                              // but that's *still* not enough!
builder = factory.newDocumentBuilder();
builder.setEntityResolver(new NullEntityResolver()); // swap in a dummy resolver
return builder().parse(xmlFile); 

Here, now, is my fake resolver: It returns an empty InputStream no matter what's asked of it.

/** my resolver that doesn't */
private static class NullEntityResolver implements EntityResolver {

    public InputSource resolveEntity(String publicId, String systemId) 
    throws SAXException, IOException {
        // Message only for debugging / if you care
        System.out.println("I'm asked to resolve: " + publicId + " / " + systemId);
        return new InputSource(new ByteArrayInputStream(new byte[0]));
    }

}

Alternatively, your fake resolver could return streams of actual documents read as local resources or whatever.

like image 59
Carl Smotricz Avatar answered Sep 22 '22 11:09

Carl Smotricz


Be aware that disabling DTD loading will cause parsing to fail if the DTD defines any entities that your XML file depends on. That said, to disable DTD loading try this, which assumes you're using the default Xerces that ships with Java.

    /*
     * Instantiate the SAXParser and set the features to prevent loading of an external DTD
     */
   SAXParser sp = SAXParserFactory.newInstance().newSAXParser();
   XMLReader xrdr = sp.getXMLReader();
   xrdr.setFeature("http://xml.org/sax/features/validation", false);
   xrdr.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);

If you really need the DTD, then the other alternative is to implement a local XML catalog

    /*
     * Instantiate the SAXParser and add catalog support
     */
   SAXParser sp = SAXParserFactory.newInstance().newSAXParser();
   XMLReader xrdr = sp.getXMLReader();

   CatalogResolver cr = new CatalogResolver();
   xrdr.setEntityResolver(cr);

To which you will have to provide the appropriate DTDs and an XML catalog definition. This Wikipedia Article and this article were helpful.

CatalogResolver looks at the system property xml.catalog.files to determine what catalogs to load.

like image 27
Jim Garrison Avatar answered Sep 23 '22 11:09

Jim Garrison