Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to load a relative system DTD into a StAX parser?

I am using woodstox to implement a StAX parser for XML files. Assume that I have a valid XML file with matching DTD somewhere in a common directory in my filesystem.

/path/to/test.xml
/path/to/test.dtd

The XML references to its DTD using a relative system identifier declaration as follows:

<!DOCTYPE test SYSTEM "test.dtd">

From a validation viewpoint, everything seems fine to me. (Is it? xmllint does not complain.) However, when I am trying to parse the file with the code below, woodstox throws a java.io.FileNotFoundException since it cannot find the relative DTD file. It seems to me that the implementation tries to access the DTD file relative to the working directory instead of relative to the XML file object.

import java.io.FileInputStream;
import javax.xml.stream.XMLInputFactory;
import javax.xml.stream.XMLStreamConstants;
import javax.xml.stream.XMLStreamReader;

public class Test {

    public static void main( String[] args ) throws Exception {

        FileInputStream fileInputStream = new FileInputStream( args[0] );
        XMLInputFactory xmlInputFactory = XMLInputFactory.newFactory();
        XMLStreamReader xsr = xmlInputFactory.createXMLStreamReader(fileInputStream);

        while( xsr.hasNext() ) {
            if( xsr.next() == XMLStreamConstants.DTD ) {
                System.err.println( xsr.getText() );
            }
        }
    }
}
  1. Is this intentional?
  2. Is there a convenient way to convince the StAX parser to load the DTD relative to a given XML file instead of relative to the working directory?
like image 976
MRA Avatar asked May 21 '12 12:05

MRA


2 Answers

You are going to need to provide your own implementation of the XMLResolver interface (it's been known as EntityResolver in the SAX world) to help the parser find the DTD. The XMLInputFactory has the setXMLResolver() method that would do it for you.

Some more information on the subject:

  • XML Entity and URI Resolvers

It's also a good idea to take a look under the hood to understand what exactly is going on when parsers need to resolve a SYSTEM URI. Woodstox, for example, has an internal (and a default) implementation of the XMLResolver (as well as a proxy between the SAX's EntityResolver and a StAX XMLResolver). Look at what it does with your DTD "filename" and you will see why it's working the way it is.

like image 64
Pavel Veller Avatar answered Nov 14 '22 23:11

Pavel Veller


@Pavel Veller's answer is correct. Here's a concrete example of it in use:

/**
 * Responsible for parsing the specified XML file and creating objects for
 * insertion into the MySQL database.
 * 
 * @author cameronhudson
 *
 */
public class Parser {

  /**
   * Creates a new XMLStreamReader from the specified file.
   * 
   * @param file The relative path of the file to load.
   * @return An XMLStreamReader to be used for parsing.
   */
  private static XMLStreamReader getXmlReader(String filename) {

    // Initialize an XMLStreamReader
    XMLStreamReader reader;

    // Instantiate an XMLInputFactory and set an XMLResolver
    XMLInputFactory factory = XMLInputFactory.newInstance();
    factory.setXMLResolver(new XMLResolver() {

      @Override
      public Object resolveEntity(String publicID, String systemID,
          String baseURI, String namespace) throws XMLStreamException {

        /*
         * The systemID argument is the same dtd file specified in the xml file
         * header. For example, if the xml header is <!DOCTYPE dblp SYSTEM
         * "dblp.dtd">, then systemID will be "dblp.dtd".
         * 
         */
        return Parser.filenameToStream(systemID);
      }

    });

    // Get the XML file as an InputStream.
    InputStream stream = Parser.filenameToStream(filename);

    // Instantiate a new XMLStreamReader.
    try {
      reader = factory.createXMLStreamReader(stream);
    } catch (XMLStreamException e) {
      System.err.println(e);
      return null;
    }
    return reader;
  }

  /**
   * Converts a local resource filename into a path dependent on the runtime
   * environment.
   * 
   * @param filename The local path of the resource within /src/main/resources/.
   * @return An input stream of the file.
   */
  private static InputStream filenameToStream(String filename) {
    return Thread.currentThread().getContextClassLoader()
        .getResourceAsStream(filename);
  }

}
like image 29
Cameron Hudson Avatar answered Nov 14 '22 22:11

Cameron Hudson