Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Getting DOM tree of XML document

Tags:

python

dom

xml

Does anyone know how I would get a DOM instance (tree) of an XML file in Python. I am trying to compare two XML documents to eachother that may have elements and attributes in different order. How would I do this?

like image 690
Dave Avatar asked Oct 26 '22 02:10

Dave


2 Answers

Personally, whenever possible, I'd start with elementtree (preferably the C implementation that comes with Python's standard library, or the lxml implementation, but that's essentialy a matter of higher speed, only). It's not a standard-compliant DOM, but holds the same information in a more Pythonic and handier way. You can start by calling xml.etree.ElementTree.parse, which takes the XML source and returns an element-tree; do that on both sources, use getroot on each element tree to obtain its root element, then recursively compare elements starting from the root ones.

Children of an element form a sequence, in element tree just as in the standard DOM, meaning their order is considered important; but it's easy to make Python sets out of them (or with a little more effort "multi-sets" of some kind, if repetitions are important in your use case though order is not) for a laxer comparison. It's even easier for attributes for a given element, where uniqueness is assured and order is semantically not relevant.

Is there some specific reason you need a standard DOM rather than an alternative container like an element tree, or are you just using the term DOM in a general sense so that element tree would be OK?

In the past I've also had good results using PyRXP, which uses an even starker and simpler representation than ElementTree. However, it WAS years and years ago; I have no recent experience as to how PyRXP today compares with lxml or cElementTree.

like image 164
Alex Martelli Avatar answered Nov 15 '22 08:11

Alex Martelli


Some solutions to ponder:

  • minidom
  • amara (xml data binding)
like image 38
geowa4 Avatar answered Nov 15 '22 08:11

geowa4