Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a way to parse XML via SAX/DOM with line numbers available per node

Tags:

java

dom

xml

sax

I already have written a DOM parser for a large XML document format that contains a number of items that can be used to automatically generate Java code. This is limited to small expressions that are then merged into a dynamically generated Java source file.

So far - so good. Everything works.

BUT - I wish to be able to embed the line number of the XML node where the Java code was included from (so that if the configuration contains uncompilable code, each method will have a pointer to the source XML document and the line number for ease of debugging). I don't require the line number at parse-time and I don't need to validate the XML Source Document and throw an error at a particular line number. I need to be able to access the line number for each node and attribute in my DOM or per SAX event.

Any suggestions on how I might be able to achieve this?

P.S. Also, I read the StAX has a method to obtain line number whilst parsing, but ideally I would like to achieve the same result with regular SAX/DOM processing in Java 4/5 rather than become a Java 6+ application or take on extra .jar files.

like image 411
Chris Avatar asked May 09 '10 16:05

Chris


People also ask

How can parsing the XML data using DOM and SAX?

The two common ways to parse an XML document are given below: DOM Parser: Parsing the document by loading all the content of the document and creating its hierarchical tree structure. SAX Parser: Parsing based on event-based triggers. It does not require the complete loading of content.

How SAX is an alternative method for parsing XML document?

SAX (Simple API for XML) is an event-driven algorithm for parsing XML documents. SAX is an alternative to the Document Object Model (DOM). Where the DOM reads the whole document to operate on XML, SAX parsers read XML node by node, issuing parsing events while making a step through the input stream.

How DOM parses an XML file?

Android DOM(Document Object Model) parser is a program that parses an XML document and extracts the required information from it. This parser uses an object-based approach for creating and parsing the XML files. In General, a DOM parser loads the XML file into the Android memory to parse the XML document.

Is SAX parser faster than DOM?

SAX Parser is slower than DOM Parser.


2 Answers

I know this thread is a little old (sorry), but it has taken me so long to crack this nut I had to share the solution with someone...

You only seem to be able to obtain the line numbers with SAX which doesn't build a DOM. The DOM parser does not give the line numbers, and neither does it let you near the SAX parser it is using. My solution is to do an empty XSLT transformation using a SAX source and a DOM result, but even then someone has done their best to hide this. See the code below.

I add the location information to each element as an attribute with my own namespace, so I can find elements using XPath and report where the data came from.

Hope this helps:

// The file to parse.
String systemId = "myxml.xml";

/*
 * Create transformer SAX source that adds current element position to
 * the element as attributes.
 */
XMLReader xmlReader = XMLReaderFactory.createXMLReader();
LocationFilter locationFilter = new LocationFilter(xmlReader);

InputSource inputSource = new InputSource(new FileReader(systemId));
// Do this so that XPath function document() can take relative URI.
inputSource.setSystemId(systemId);
SAXSource saxSource = new SAXSource(locationFilter, inputSource);

/*
 * Perform an empty transformation from SAX source to DOM result.
 */
TransformerFactory transformerFactory = TransformerFactory.newInstance();
Transformer transformer = transformerFactory.newTransformer();
DOMResult domResult = new DOMResult();
transformer.transform(saxSource, domResult);
Node root = domResult.getNode();

...
class LocationFilter extends XMLFilterImpl {

    LocationFilter(XMLReader xmlReader) {
        super(xmlReader);
    }

    private Locator locator = null;

    @Override
    public void setDocumentLocator(Locator locator) {
        super.setDocumentLocator(locator);
        this.locator = locator;
    }

    @Override
    public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {

        // Add extra attribute to elements to hold location
        String location = locator.getSystemId() + ':' + locator.getLineNumber() + ':' + locator.getColumnNumber();
        Attributes2Impl attrs = new Attributes2Impl(attributes);
        attrs.addAttribute("http://myNamespace", "location", "myns:location", "CDATA", location);
        super.startElement(uri, localName, qName, attrs);
    }
}
like image 173
Reg Whitton Avatar answered Nov 03 '22 00:11

Reg Whitton


I ran into this issue recently and I thought I'd share a ready made utility class for handling it. Works with Java 11, whereas some of Reg Whitton's code uses some now deprecated classes.

Mostly based on this article with a few tweaks. Notably, storing the line number as a the node's user data rather than setting it as an attribute.

import java.io.IOException;
import java.io.InputStream;
import java.util.ArrayDeque;
import java.util.Deque;

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;

import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.Node;
import org.xml.sax.Attributes;
import org.xml.sax.Locator;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;

public class XmlDom {

    public static Document readXML(InputStream is, final String lineNumAttribName) throws IOException, SAXException {
        final Document doc;
        SAXParser parser;
        try {
            SAXParserFactory factory = SAXParserFactory.newInstance();
            parser = factory.newSAXParser();
            DocumentBuilderFactory docBuilderFactory = DocumentBuilderFactory.newInstance();
            DocumentBuilder docBuilder = docBuilderFactory.newDocumentBuilder();
            doc = docBuilder.newDocument();           
        } catch(ParserConfigurationException e){
            throw new RuntimeException("Can't create SAX parser / DOM builder.", e);
        }

        final Deque<Element> elementStack = new ArrayDeque<>();
        final StringBuilder textBuffer = new StringBuilder();
        DefaultHandler handler = new DefaultHandler() {
            private Locator locator;

            @Override
            public void setDocumentLocator(Locator locator) {
                this.locator = locator; //Save the locator, so that it can be used later for line tracking when traversing nodes.
            }

            @Override
            public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {               
                addTextIfNeeded();
                Element el = doc.createElement(qName);
                for(int i = 0;i < attributes.getLength(); i++)
                    el.setAttribute(attributes.getQName(i), attributes.getValue(i));
                el.setUserData(lineNumAttribName, String.valueOf(locator.getLineNumber()), null);
                elementStack.push(el);               
            }

            @Override
            public void endElement(String uri, String localName, String qName){
                addTextIfNeeded();
                Element closedEl = elementStack.pop();
                if (elementStack.isEmpty()) { // Is this the root element?
                    doc.appendChild(closedEl);
                } else {
                    Element parentEl = elementStack.peek();
                    parentEl.appendChild(closedEl);                   
                }
            }

            @Override
            public void characters (char ch[], int start, int length) throws SAXException {
                textBuffer.append(ch, start, length);
            }

            // Outputs text accumulated under the current node
            private void addTextIfNeeded() {
                if (textBuffer.length() > 0) {
                    Element el = elementStack.peek();
                    Node textNode = doc.createTextNode(textBuffer.toString());
                    el.appendChild(textNode);
                    textBuffer.delete(0, textBuffer.length());
                }
            }           
        };
        parser.parse(is, handler);

        return doc;
    }   

}

Access the line number with

node.getUserData(lineNumAttribName);
like image 28
Kris Avatar answered Nov 03 '22 00:11

Kris