Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to track the source line (location) of an XML element?

I assume that there is probably no satisfactory answer to this question, but I ask it anyway in case I missed something.

Basically, I want to find out the line in the source document from which a certain XML element originated, given the element instance. I want this only for better diagnostic error messages - the XML is part of a configuration file, and if there is something wrong with it, I want to be able to point the reader of the error message to exactly the right place in the XML document so he can correct the error.

I understand that the standard Scala XML support probably has no built-in feature like this. After all, it would be wasteful to annotate every single NodeSeq instance with such information, and not every XML element even has a source document from which it has been parsed. It seems to me that the standard Scala XML parser throws the line information away, and later on there is no way to retrieve it.

But switching to another XML framework is not an option. Adding another library dependency "only" for the sake of better diagnostic error messages seems inappropriate to me. Also, despite some shortcomings, I really like the built-in pattern matching support for XML.

My only hope is that you can show me a way to alter or subclass the standard Scala XML parser such that the nodes it produces will be annotated with the number of the source line. Maybe a special subclass of NodeSeq can be created for this. Or maybe only Atom can be subclassed because NodeSeq is too dynamic? I don't know.

Anyway, my hopes are close to zero. I don't think there is a place in the parser where we can hook in to change the way nodes are created, and that at that place the line information is available. Still, I wonder why I have not found this question before. Please point me to the original if this is a duplicate.

like image 959
Madoc Avatar asked Dec 15 '10 02:12

Madoc


People also ask

How can we access the data from XML elements?

Retrieving information from XML files by using the Document Object Model, XmlReader class, XmlDocument class, and XmlNode class. Synchronizing DataSet data with XML via the XmlDataDocument class. Executing XML queries with XPath and the XPathNavigator class.

What is XML tag?

XML tags are the important features of XML document. It is similar to HTML but XML is more flexible then HTML. It allows to create new tags (user defined tags). The first element of XML document is called root element. The simple XML document contain opening tag and closing tag.

Can we have empty XML tags?

Empty XML ElementsAn element with no content is said to be empty. The two forms produce identical results in XML software (Readers, Parsers, Browsers). Empty elements can have attributes.


1 Answers

I had no idea how to do that, but Pangea showed me the way. First, let's create a trait to handle location:

import org.xml.sax.{helpers, Locator, SAXParseException}
trait WithLocation extends helpers.DefaultHandler {
    var locator: org.xml.sax.Locator = _
    def printLocation(msg: String) {
        println("%s at line %d, column %d" format (msg, locator.getLineNumber, locator.getColumnNumber))
    }

    // Get location
    abstract override def setDocumentLocator(locator: Locator) {
        this.locator = locator
        super.setDocumentLocator(locator)
    }

    // Display location messages
    abstract override def warning(e: SAXParseException) {
        printLocation("warning")
        super.warning(e)
    }
    abstract override def error(e: SAXParseException) {
        printLocation("error")
        super.error(e)
    }
    abstract override def fatalError(e: SAXParseException) {
        printLocation("fatal error")
        super.fatalError(e)
    }
}

Next, let's create our own loader overriding XMLLoader's adapter to include our trait:

import scala.xml.{factory, parsing, Elem}
object MyLoader extends factory.XMLLoader[Elem] {
    override def adapter = new parsing.NoBindingFactoryAdapter with WithLocation
}

And that's all there is to it! The object XML adds little to XMLLoader -- basically, the save methods. You might want to look at its source code if you feel the need for a full replacement. But this is only if you want to handle all of this yourself, since Scala already have a trait to produce errors:

object MyLoader extends factory.XMLLoader[Elem] {
    override def adapter = new parsing.NoBindingFactoryAdapter with parsing.ConsoleErrorHandler
}

The ConsoleErrorHandler trait extract its line and number information from the exception, by the way. For our purposes, we need the location outside exceptions too (I'm assuming).

Now, to modify node creation itself, look at the scala.xml.factory.FactoryAdapter abstract methods. I have settled on createNode, but I'm overriding at the NoBindingFactoryAdapter level, because that returns Elem instead of Node, which enables me to add attributes. So:

import org.xml.sax.Locator
import scala.xml._
import parsing.NoBindingFactoryAdapter
trait WithLocation extends NoBindingFactoryAdapter {
    var locator: org.xml.sax.Locator = _

    // Get location
    abstract override def setDocumentLocator(locator: Locator) {
        this.locator = locator
        super.setDocumentLocator(locator)
    }

    abstract override def createNode(pre: String, label: String, attrs: MetaData, scope: NamespaceBinding, children: List[Node]): Elem = (
        super.createNode(pre, label, attrs, scope, children) 
        % Attribute("line", Text(locator.getLineNumber.toString), Null) 
        % Attribute("column", Text(locator.getColumnNumber.toString), Null)
    )
}

object MyLoader extends factory.XMLLoader[Elem] {
    // Keeping ConsoleErrorHandler for good measure
    override def adapter = new parsing.NoBindingFactoryAdapter with parsing.ConsoleErrorHandler with WithLocation
}

Result:

scala> MyLoader.loadString("<a><b/></a>")
res4: scala.xml.Elem = <a line="1" column="12"><b line="1" column="8"></b></a>

Note that it got the last location, the one at the closing tag. That's one thing that can be improved by overriding startElement to keep track of where each element started in a stack, and endElement to pop from this stack into a var used by createNode.

Nice question. I learned a lot! :-)

like image 102
Daniel C. Sobral Avatar answered Sep 23 '22 09:09

Daniel C. Sobral