Does anyone know how I would get a DOM instance (tree) of an XML file in Python. I am trying to compare two XML documents to eachother that may have elements and attributes in different order. How would I do this?
Personally, whenever possible, I'd start with elementtree (preferably the C implementation that comes with Python's standard library, or the lxml implementation, but that's essentialy a matter of higher speed, only). It's not a standard-compliant DOM, but holds the same information in a more Pythonic and handier way. You can start by calling xml.etree.ElementTree.parse
, which takes the XML source and returns an element-tree; do that on both sources, use getroot
on each element tree to obtain its root element, then recursively compare elements starting from the root ones.
Children of an element form a sequence, in element tree just as in the standard DOM, meaning their order is considered important; but it's easy to make Python sets out of them (or with a little more effort "multi-sets" of some kind, if repetitions are important in your use case though order is not) for a laxer comparison. It's even easier for attributes for a given element, where uniqueness is assured and order is semantically not relevant.
Is there some specific reason you need a standard DOM rather than an alternative container like an element tree, or are you just using the term DOM in a general sense so that element tree would be OK?
In the past I've also had good results using PyRXP, which uses an even starker and simpler representation than ElementTree. However, it WAS years and years ago; I have no recent experience as to how PyRXP today compares with lxml or cElementTree.
Some solutions to ponder:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With