I assume that there is probably no satisfactory answer to this question, but I ask it anyway in case I missed something.
Basically, I want to find out the line in the source document from which a certain XML element originated, given the element instance. I want this only for better diagnostic error messages - the XML is part of a configuration file, and if there is something wrong with it, I want to be able to point the reader of the error message to exactly the right place in the XML document so he can correct the error.
I understand that the standard Scala XML support probably has no built-in feature like this. After all, it would be wasteful to annotate every single NodeSeq
instance with such information, and not every XML element even has a source document from which it has been parsed. It seems to me that the standard Scala XML parser throws the line information away, and later on there is no way to retrieve it.
But switching to another XML framework is not an option. Adding another library dependency "only" for the sake of better diagnostic error messages seems inappropriate to me. Also, despite some shortcomings, I really like the built-in pattern matching support for XML.
My only hope is that you can show me a way to alter or subclass the standard Scala XML parser such that the nodes it produces will be annotated with the number of the source line. Maybe a special subclass of NodeSeq
can be created for this. Or maybe only Atom
can be subclassed because NodeSeq
is too dynamic? I don't know.
Anyway, my hopes are close to zero. I don't think there is a place in the parser where we can hook in to change the way nodes are created, and that at that place the line information is available. Still, I wonder why I have not found this question before. Please point me to the original if this is a duplicate.
Retrieving information from XML files by using the Document Object Model, XmlReader class, XmlDocument class, and XmlNode class. Synchronizing DataSet data with XML via the XmlDataDocument class. Executing XML queries with XPath and the XPathNavigator class.
XML tags are the important features of XML document. It is similar to HTML but XML is more flexible then HTML. It allows to create new tags (user defined tags). The first element of XML document is called root element. The simple XML document contain opening tag and closing tag.
Empty XML ElementsAn element with no content is said to be empty. The two forms produce identical results in XML software (Readers, Parsers, Browsers). Empty elements can have attributes.
I had no idea how to do that, but Pangea showed me the way. First, let's create a trait to handle location:
import org.xml.sax.{helpers, Locator, SAXParseException}
trait WithLocation extends helpers.DefaultHandler {
var locator: org.xml.sax.Locator = _
def printLocation(msg: String) {
println("%s at line %d, column %d" format (msg, locator.getLineNumber, locator.getColumnNumber))
}
// Get location
abstract override def setDocumentLocator(locator: Locator) {
this.locator = locator
super.setDocumentLocator(locator)
}
// Display location messages
abstract override def warning(e: SAXParseException) {
printLocation("warning")
super.warning(e)
}
abstract override def error(e: SAXParseException) {
printLocation("error")
super.error(e)
}
abstract override def fatalError(e: SAXParseException) {
printLocation("fatal error")
super.fatalError(e)
}
}
Next, let's create our own loader overriding XMLLoader
's adapter
to include our trait:
import scala.xml.{factory, parsing, Elem}
object MyLoader extends factory.XMLLoader[Elem] {
override def adapter = new parsing.NoBindingFactoryAdapter with WithLocation
}
And that's all there is to it! The object XML
adds little to XMLLoader
-- basically, the save
methods. You might want to look at its source code if you feel the need for a full replacement. But this is only if you want to handle all of this yourself, since Scala already have a trait to produce errors:
object MyLoader extends factory.XMLLoader[Elem] {
override def adapter = new parsing.NoBindingFactoryAdapter with parsing.ConsoleErrorHandler
}
The ConsoleErrorHandler
trait extract its line and number information from the exception, by the way. For our purposes, we need the location outside exceptions too (I'm assuming).
Now, to modify node creation itself, look at the scala.xml.factory.FactoryAdapter
abstract methods. I have settled on createNode
, but I'm overriding at the NoBindingFactoryAdapter
level, because that returns Elem
instead of Node
, which enables me to add attributes. So:
import org.xml.sax.Locator
import scala.xml._
import parsing.NoBindingFactoryAdapter
trait WithLocation extends NoBindingFactoryAdapter {
var locator: org.xml.sax.Locator = _
// Get location
abstract override def setDocumentLocator(locator: Locator) {
this.locator = locator
super.setDocumentLocator(locator)
}
abstract override def createNode(pre: String, label: String, attrs: MetaData, scope: NamespaceBinding, children: List[Node]): Elem = (
super.createNode(pre, label, attrs, scope, children)
% Attribute("line", Text(locator.getLineNumber.toString), Null)
% Attribute("column", Text(locator.getColumnNumber.toString), Null)
)
}
object MyLoader extends factory.XMLLoader[Elem] {
// Keeping ConsoleErrorHandler for good measure
override def adapter = new parsing.NoBindingFactoryAdapter with parsing.ConsoleErrorHandler with WithLocation
}
Result:
scala> MyLoader.loadString("<a><b/></a>")
res4: scala.xml.Elem = <a line="1" column="12"><b line="1" column="8"></b></a>
Note that it got the last location, the one at the closing tag. That's one thing that can be improved by overriding startElement
to keep track of where each element started in a stack, and endElement
to pop from this stack into a var
used by createNode
.
Nice question. I learned a lot! :-)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With