I'm currently using xml.dom.minidom to parse some XML in python. After parsing, I'm doing some reporting on the content, and would like to report the line (and column) where the tag started in the source XML document, but I don't see how that's possible.
I'd like to stick with xml.dom / xml.dom.minidom if possible, but if I need to use a SAX parser to get the origin info, I can do that -- ideal in that case would be using SAX to track node location, but still end up with a DOM for my post-processing.
Any suggestions on how to do this? Hopefully I'm just overlooking something in the docs and this extremely easy.
By monkeypatching the minidom content handler I was able to record line and column number for each node (as the 'parse_position' attribute). It's a little dirty, but I couldn't see any "officially sanctioned" way of doing it :) Here's my test script:
from xml.dom import minidom
import xml.sax
doc = """\
<File>
<name>Name</name>
<pos>./</pos>
</File>
"""
def set_content_handler(dom_handler):
def startElementNS(name, tagName, attrs):
orig_start_cb(name, tagName, attrs)
cur_elem = dom_handler.elementStack[-1]
cur_elem.parse_position = (
parser._parser.CurrentLineNumber,
parser._parser.CurrentColumnNumber
)
orig_start_cb = dom_handler.startElementNS
dom_handler.startElementNS = startElementNS
orig_set_content_handler(dom_handler)
parser = xml.sax.make_parser()
orig_set_content_handler = parser.setContentHandler
parser.setContentHandler = set_content_handler
dom = minidom.parseString(doc, parser)
pos = dom.firstChild.parse_position
print("Parent: '{0}' at {1}:{2}".format(
dom.firstChild.localName, pos[0], pos[1]))
for child in dom.firstChild.childNodes:
if child.localName is None:
continue
pos = child.parse_position
print "Child: '{0}' at {1}:{2}".format(child.localName, pos[0], pos[1])
It outputs the following:
Parent: 'File' at 1:0
Child: 'name' at 2:2
Child: 'pos' at 3:2
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With