Extremely slow XSLT transformation in Java

Question

I try to transform XML document using XSLT. As an input I have www.wordpress.org XHTML source code, and XSLT is dummy example retrieving site's title (actually it could do nothing - it doesn't change anything).

Every single API or library I use, transformation takes about 2 minutes! If you take a look at wordpress.org source, you will notice that it is only 183 lines of code. As I googled it is probably due to DOM tree building. No matter how simple XSLT is, it is always 2 minutes - so it confirms idea that it's related to DOM building, but anyway it should not take 2 minutes in my opinion.

Here is an example code (nothing special):

  TransformerFactory tFactory = TransformerFactory.newInstance();
   Transformer transformer = null;

   try {
       transformer = tFactory.newTransformer(
           new StreamSource("/home/pd/XSLT/transf.xslt"));

   } catch (TransformerConfigurationException e) {
       e.printStackTrace();
   }

   ByteArrayOutputStream outputStream = new ByteArrayOutputStream();

   System.out.println("START");
   try {
       transformer.transform(new SAXSource(new InputSource(
           new FileInputStream("/home/pd/XSLT/wordpress.xml"))),
           new StreamResult(outputStream));
   } catch (TransformerException e) {       
       e.printStackTrace();
   } catch (IOException e) {
       e.printStackTrace();
   }
   System.out.println("STOP");

   System.out.println(new String(outputStream.toByteArray()));

It's between START and STOP where java "pauses" for 2 minutes. If I take a look at the processor or memory usage, nothing increases. It looks like really JVM stopped...

Do you have any experience in transforming XMLs that are longer than 50 (this is random number ;)) lines? As I read XSLT always needs to build DOM tree in order to do its work. Fast transformation is crucial for me.

Thanks in advance, Piotr

Phil M · Accepted Answer

Does the sample HTML file use namespaces? If so, your XML parser may be attempting to retrieve contents (a schema, perhaps) from the namespace URIs. This is likely if each run takes exactly two minutes -- it's likely one or more TCP timeouts.

You can verify this by timing how long it takes to instantiate your InputSource object (where the WordPress XML is actually parsed), as this is likely the line which is causing the delay. After reviewing the sample file you posted, it does include a declared namespace (xmlns="http://www.w3.org/1999/xhtml").

To work around this, you can implement your own EntityResolver which essentially disables the URL-based resolution. You may need to use a DOM -- see DocumentBuilder's setEntityResolver method.

Here's a sample using DOM and disabling resolution (note -- this is untested):

try {
    DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
    DocumentBuilder db = dbFactory.newDocumentBuilder();
    db.setEntityResolver(new EntityResolver() {

        @Override
        public InputSource resolveEntity(String publicId, String systemId) throws SAXException, IOException {
            return null; // Never resolve any IDs
        }
    });

    System.out.println("BUILDING DOM");

    Document doc = db.parse(new FileInputStream("/home/pd/XSLT/wordpress.xml"));

    ByteArrayOutputStream outputStream = new ByteArrayOutputStream();

    TransformerFactory tFactory = TransformerFactory.newInstance();
    Transformer transformer = tFactory.newTransformer(
        new StreamSource("/home/pd/XSLT/transf.xslt"));

    System.out.println("RUNNING TRANSFORM");

    transformer.transform(
            new DOMSource(doc.getDocumentElement()),
            new StreamResult(outputStream));

    System.out.println("TRANSFORMED CONTENTS BELOW");
    System.out.println(outputStream.toString());
} catch (Exception e) {
    e.printStackTrace();
}

If you want to use SAX, you would have to use a SAXSource with an XMLReader which uses your custom resolver.

Karthik Ramachandran · Answer

The commenters who've posted that the answer likely resides with the EntityResolver are probably correct. However, the solution may not be to simply not load the schemas but rather load them from the local file system.

So you could do something like this

  db.setEntityResolver(new EntityResolver() {

    @Override
    public InputSource resolveEntity(String publicId, String systemId) throws SAXException, IOException {
        try {
        FileInputStream fis = new FileInputStream(new File("classpath:xsd/" + systemId));
        InputSource is  = new InputSource(fis);
        return is
    } catch (FileNotFoundException ex) {
        logger.error("File Not found", ex);
        return null;
    }
    }
});

Extremely slow XSLT transformation in Java

Tags:

java

xml

xslt

omnomnom

2 Answers

Phil M

Karthik Ramachandran

Recent Activity

Donate For Us

Extremely slow XSLT transformation in Java

Tags:

java

xml

xslt

omnomnom

2 Answers

Phil M

Karthik Ramachandran

Related questions

Recent Activity

Donate For Us