Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Testing Equivalence of xml.etree.ElementTree

I'm interested in equivalence of two xml elements; and I've found that testing the tostring of the elements works; however, that seems hacky.

Is there a better way to test equivalence of two etree Elements?

Comparing Elements directly:

import xml.etree.ElementTree as etree h1 = etree.Element('hat',{'color':'red'}) h2 = etree.Element('hat',{'color':'red'})  h1 == h2  # False 

Comparing Elements as strings:

etree.tostring(h1) == etree.tostring(h2)  # True 
like image 355
oneporter Avatar asked Oct 26 '11 15:10

oneporter


People also ask

What is ElementTree?

ElementTree is an important Python library that allows you to parse and navigate an XML document. Using ElementTree breaks down the XML document in a tree structure that is easy to work with.

What does Etree parse do?

Parsing from strings and files. lxml. etree supports parsing XML in a number of ways and from all important sources, namely strings, files, URLs (http/ftp) and file-like objects. The main parse functions are fromstring() and parse(), both called with the source as first argument.


2 Answers

This compare function works for me:

def elements_equal(e1, e2):     if e1.tag != e2.tag: return False     if e1.text != e2.text: return False     if e1.tail != e2.tail: return False     if e1.attrib != e2.attrib: return False     if len(e1) != len(e2): return False     return all(elements_equal(c1, c2) for c1, c2 in zip(e1, e2)) 
like image 87
Itamar Avatar answered Oct 08 '22 13:10

Itamar


Comparing strings doesn't always work. The order of the attributes should not matter for considering two nodes equivalent. However, if you do string comparison, the order obviously matters.

I'm not sure if it is a problem or a feature, but my version of lxml.etree preserves the order of the attributes if they are parsed from a file or a string:

>>> from lxml import etree >>> h1 = etree.XML('<hat color="blue" price="39.90"/>') >>> h2 = etree.XML('<hat price="39.90" color="blue"/>') >>> etree.tostring(h1) == etree.tostring(h2) False 

This might be version-dependent (I use Python 2.7.3 with lxml.etree 2.3.2 on Ubuntu); I remember that I couldn't find a way of controlling the order of the attributes a year ago or so, when I wanted to (for readability reasons).

As I need to compare XML files that were produced by different serializers, I see no other way than recursively comparing tag, text, attributes, and children of every node. And of course tail, if there's anything interesting there.

Comparison of lxml and xml.etree.ElementTree

The truth is that it may be implementation dependent. Apparently, lxml uses ordered dict or something like that, the standard xml.etree.ElementTree does not preserve the order of attributes:

Python 2.7.1 (r271:86832, Nov 27 2010, 17:19:03) [MSC v.1500 64 bit (AMD64)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> from lxml import etree >>> h1 = etree.XML('<hat color="blue" price="39.90"/>') >>> h2 = etree.XML('<hat price="39.90" color="blue"/>') >>> etree.tostring(h1) == etree.tostring(h2) False >>> etree.tostring(h1) '<hat color="blue" price="39.90"/>' >>> etree.tostring(h2) '<hat price="39.90" color="blue"/>' >>> etree.dump(h1) <hat color="blue" price="39.90"/>>>> etree.dump(h2) <hat price="39.90" color="blue"/>>>> 

(Yes, the newlines are missing. But it is a minor problem.)

>>> import xml.etree.ElementTree as ET >>> h1 = ET.XML('<hat color="blue" price="39.90"/>') >>> h1 <Element 'hat' at 0x2858978> >>> h2 = ET.XML('<hat price="39.90" color="blue"/>') >>> ET.dump(h1) <hat color="blue" price="39.90" /> >>> ET.dump(h2) <hat color="blue" price="39.90" /> >>> ET.tostring(h1) == ET.tostring(h2) True >>> ET.dump(h1) == ET.dump(h2) <hat color="blue" price="39.90" /> <hat color="blue" price="39.90" /> True 

Another question may be what is considered unimportant whan comparing. For example, some fragments may contain extra spaces and we do not want to care. This way, it is always better to write some serializing function that works exactly we need.

like image 38
lenz Avatar answered Oct 08 '22 13:10

lenz