Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using SAX to parse common XML elements

Tags:

java

xml

sax

I'm currently using SAX (Java) to parse a a handful of different XML documents, with each document representing different data and having slightly different structures. For this reason, each XML document is handled by a different SAX class (subclassing DefaultHandler).

However, there are some XML structures that can appear in all these different documents. Ideally, I'd like to tell the parser "Hey, when you reach a complex_node element, just use ComplexNodeHandler to read it, and give me back the result. If you reach a some_other_node, use OtherNodeHandler to read it and give me back that result".

However, I can't see an obvious way to do this.

Should I simply just make a monolithic handler class that can read all the different documents I have (and eradicate duplication of code), or is there a smarter way to handle this?

like image 746
Dave Avatar asked Aug 04 '10 12:08

Dave


2 Answers

Below is an answer I made to a similar question (Skipping nodes with sax). It demonstrates how to swap content handlers on an XMLReader.

In this example the swapped in ContentHandler simply ignores all events until it gives up control, but you could adapt the concept easily.


You could do something like the following:

import javax.xml.parsers.SAXParser; 
import javax.xml.parsers.SAXParserFactory; 
import org.xml.sax.XMLReader; 

public class Demo { 

    public static void main(String[] args) throws Exception { 
        SAXParserFactory spf = SAXParserFactory.newInstance(); 
        SAXParser sp = spf.newSAXParser(); 
        XMLReader xr = sp.getXMLReader(); 
        xr.setContentHandler(new MyContentHandler(xr)); 
        xr.parse("input.xml"); 
    } 
} 

MyContentHandler

This class is responsible for processing your XML document. When you hit a node you want to ignore you can swap in the IgnoringContentHandler which will swallow all events for that node.

import org.xml.sax.Attributes; 
import org.xml.sax.ContentHandler; 
import org.xml.sax.Locator; 
import org.xml.sax.SAXException; 
import org.xml.sax.XMLReader; 

public class MyContentHandler implements ContentHandler { 

    private XMLReader xmlReader; 

    public MyContentHandler(XMLReader xmlReader) { 
        this.xmlReader = xmlReader; 
    } 

    public void setDocumentLocator(Locator locator) { 
    } 

    public void startDocument() throws SAXException { 
    } 

    public void endDocument() throws SAXException { 
    } 

    public void startPrefixMapping(String prefix, String uri) 
            throws SAXException { 
    } 

    public void endPrefixMapping(String prefix) throws SAXException { 
    } 

    public void startElement(String uri, String localName, String qName, 
            Attributes atts) throws SAXException { 
        if("sodium".equals(qName)) { 
            xmlReader.setContentHandler(new IgnoringContentHandler(xmlReader, this)); 
        } else { 
            System.out.println("START " + qName); 
        } 
    } 

    public void endElement(String uri, String localName, String qName) 
            throws SAXException { 
        System.out.println("END " + qName); 
    } 

    public void characters(char[] ch, int start, int length) 
            throws SAXException { 
        System.out.println(new String(ch, start, length)); 
    } 

    public void ignorableWhitespace(char[] ch, int start, int length) 
            throws SAXException { 
    } 

    public void processingInstruction(String target, String data) 
            throws SAXException { 
    } 

    public void skippedEntity(String name) throws SAXException { 
    } 

} 

IgnoringContentHandler

When the IgnoringContentHandler is done swallowing events it passes control back to your main ContentHandler.

import org.xml.sax.Attributes; 
import org.xml.sax.ContentHandler; 
import org.xml.sax.Locator; 
import org.xml.sax.SAXException; 
import org.xml.sax.XMLReader; 

public class IgnoringContentHandler implements ContentHandler { 

    private int depth = 1; 
    private XMLReader xmlReader; 
    private ContentHandler contentHandler; 

    public IgnoringContentHandler(XMLReader xmlReader, ContentHandler contentHandler) { 
        this.contentHandler = contentHandler; 
        this.xmlReader = xmlReader; 
    } 

    public void setDocumentLocator(Locator locator) { 
    } 

    public void startDocument() throws SAXException { 
    } 

    public void endDocument() throws SAXException { 
    } 

    public void startPrefixMapping(String prefix, String uri) 
            throws SAXException { 
    } 

    public void endPrefixMapping(String prefix) throws SAXException { 
    } 

    public void startElement(String uri, String localName, String qName, 
            Attributes atts) throws SAXException { 
        depth++; 
    } 

    public void endElement(String uri, String localName, String qName) 
            throws SAXException { 
        depth--; 
        if(0 == depth) { 
           xmlReader.setContentHandler(contentHandler); 
        } 
    } 

    public void characters(char[] ch, int start, int length) 
            throws SAXException { 
    } 

    public void ignorableWhitespace(char[] ch, int start, int length) 
            throws SAXException { 
    } 

    public void processingInstruction(String target, String data) 
            throws SAXException { 
    } 

    public void skippedEntity(String name) throws SAXException { 
    } 

} 
like image 73
bdoughan Avatar answered Oct 20 '22 21:10

bdoughan


You could have one handler (ComplexNodeHandler) that handles only some parts of a document (complex_node) and passes all other pieces to another handler. The constructor for ComplexNodeHandler would take the other handler as a parameter. I mean something like this:

class ComplexNodeHandler {

    private ContentHandler handlerForOtherNodes;

    public ComplexNodeHandler(ContentHandler handlerForOtherNodes) {
         this.handlerForOtherNodes = handlerForOtherNodes;
    }

    ...

    public startElement(String uri, String localName, String qName, Attributes atts) {
        if (currently in complex node) {
            [handle complex node data] 
        } else {
            // pass the event to the document specific handler
            handlerForOtherNodes.startElement(uri, localName, qName, atts);
       }
    } 

    ...

}

There could be better alternatives still since I'm not that familiar with SAX. Writing a base handler for the common parts and inheriting it could work too but I'm not sure if using inheritance here is a good idea.

like image 22
COME FROM Avatar answered Oct 20 '22 19:10

COME FROM