I'm interested in equivalence of two xml elements; and I've found that testing the tostring of the elements works; however, that seems hacky. Is there a better way to test equivalence of two etree Elements? Comparing Elements directly: <pre class="prettyprint"><code>import xml.etree.ElementTree as etree h1 = etree.Element('hat',{'color':'red'}) h2 = etree.Element('hat',{'color':'red'}) h1 == h2 # False </code></pre> Comparing Elements as strings: <pre class="prettyprint"><code>etree.tostring(h1) == etree.tostring(h2) # True </code></pre>

Comparing strings doesn't always work. The order of the attributes should not matter for considering two nodes equivalent. However, if you do string comparison, the order obviously matters. I'm not sure if it is a problem or a feature, but my version of lxml.etree preserves the order of the attributes if they are parsed from a file or a string: <pre class="prettyprint"><code>>>> from lxml import etree >>> h1 = etree.XML('<hat color="blue" price="39.90"/>') >>> h2 = etree.XML('<hat price="39.90" color="blue"/>') >>> etree.tostring(h1) == etree.tostring(h2) False </code></pre> This might be version-dependent (I use Python 2.7.3 with lxml.etree 2.3.2 on Ubuntu); I remember that I couldn't find a way of controlling the order of the attributes a year ago or so, when I wanted to (for readability reasons). As I need to compare XML files that were produced by different serializers, I see no other way than recursively comparing tag, text, attributes, and children of every node. And of course tail, if there's anything interesting there. Comparison of lxml and xml.etree.ElementTree The truth is that it may be implementation dependent. Apparently, lxml uses ordered dict or something like that, the standard xml.etree.ElementTree does not preserve the order of attributes: <pre class="prettyprint"><code>Python 2.7.1 (r271:86832, Nov 27 2010, 17:19:03) [MSC v.1500 64 bit (AMD64)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> from lxml import etree >>> h1 = etree.XML('<hat color="blue" price="39.90"/>') >>> h2 = etree.XML('<hat price="39.90" color="blue"/>') >>> etree.tostring(h1) == etree.tostring(h2) False >>> etree.tostring(h1) '<hat color="blue" price="39.90"/>' >>> etree.tostring(h2) '<hat price="39.90" color="blue"/>' >>> etree.dump(h1) <hat color="blue" price="39.90"/>>>> etree.dump(h2) <hat price="39.90" color="blue"/>>>> </code></pre> (Yes, the newlines are missing. But it is a minor problem.) <pre class="prettyprint"><code>>>> import xml.etree.ElementTree as ET >>> h1 = ET.XML('<hat color="blue" price="39.90"/>') >>> h1 <Element 'hat' at 0x2858978> >>> h2 = ET.XML('<hat price="39.90" color="blue"/>') >>> ET.dump(h1) <hat color="blue" price="39.90" /> >>> ET.dump(h2) <hat color="blue" price="39.90" /> >>> ET.tostring(h1) == ET.tostring(h2) True >>> ET.dump(h1) == ET.dump(h2) <hat color="blue" price="39.90" /> <hat color="blue" price="39.90" /> True </code></pre> Another question may be what is considered unimportant whan comparing. For example, some fragments may contain extra spaces and we do not want to care. This way, it is always better to write some serializing function that works exactly we need.

Testing Equivalence of xml.etree.ElementTree

I'm interested in equivalence of two xml elements; and I've found that testing the tostring of the elements works; however, that seems hacky.

Is there a better way to test equivalence of two etree Elements?

Comparing Elements directly:

import xml.etree.ElementTree as etree h1 = etree.Element('hat',{'color':'red'}) h2 = etree.Element('hat',{'color':'red'})  h1 == h2  # False

Comparing Elements as strings:

etree.tostring(h1) == etree.tostring(h2)  # True

What is ElementTree?

ElementTree is an important Python library that allows you to parse and navigate an XML document. Using ElementTree breaks down the XML document in a tree structure that is easy to work with.

What does Etree parse do?

Parsing from strings and files. lxml. etree supports parsing XML in a number of ways and from all important sources, namely strings, files, URLs (http/ftp) and file-like objects. The main parse functions are fromstring() and parse(), both called with the source as first argument.

This compare function works for me:

def elements_equal(e1, e2):     if e1.tag != e2.tag: return False     if e1.text != e2.text: return False     if e1.tail != e2.tail: return False     if e1.attrib != e2.attrib: return False     if len(e1) != len(e2): return False     return all(elements_equal(c1, c2) for c1, c2 in zip(e1, e2))

Comparing strings doesn't always work. The order of the attributes should not matter for considering two nodes equivalent. However, if you do string comparison, the order obviously matters.

I'm not sure if it is a problem or a feature, but my version of lxml.etree preserves the order of the attributes if they are parsed from a file or a string:

>>> from lxml import etree >>> h1 = etree.XML('<hat color="blue" price="39.90"/>') >>> h2 = etree.XML('<hat price="39.90" color="blue"/>') >>> etree.tostring(h1) == etree.tostring(h2) False

This might be version-dependent (I use Python 2.7.3 with lxml.etree 2.3.2 on Ubuntu); I remember that I couldn't find a way of controlling the order of the attributes a year ago or so, when I wanted to (for readability reasons).

As I need to compare XML files that were produced by different serializers, I see no other way than recursively comparing tag, text, attributes, and children of every node. And of course tail, if there's anything interesting there.

Comparison of lxml and xml.etree.ElementTree

The truth is that it may be implementation dependent. Apparently, lxml uses ordered dict or something like that, the standard xml.etree.ElementTree does not preserve the order of attributes:

Python 2.7.1 (r271:86832, Nov 27 2010, 17:19:03) [MSC v.1500 64 bit (AMD64)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> from lxml import etree >>> h1 = etree.XML('<hat color="blue" price="39.90"/>') >>> h2 = etree.XML('<hat price="39.90" color="blue"/>') >>> etree.tostring(h1) == etree.tostring(h2) False >>> etree.tostring(h1) '<hat color="blue" price="39.90"/>' >>> etree.tostring(h2) '<hat price="39.90" color="blue"/>' >>> etree.dump(h1) <hat color="blue" price="39.90"/>>>> etree.dump(h2) <hat price="39.90" color="blue"/>>>>

(Yes, the newlines are missing. But it is a minor problem.)

>>> import xml.etree.ElementTree as ET >>> h1 = ET.XML('<hat color="blue" price="39.90"/>') >>> h1 <Element 'hat' at 0x2858978> >>> h2 = ET.XML('<hat price="39.90" color="blue"/>') >>> ET.dump(h1) <hat color="blue" price="39.90" /> >>> ET.dump(h2) <hat color="blue" price="39.90" /> >>> ET.tostring(h1) == ET.tostring(h2) True >>> ET.dump(h1) == ET.dump(h2) <hat color="blue" price="39.90" /> <hat color="blue" price="39.90" /> True

Another question may be what is considered unimportant whan comparing. For example, some fragments may contain extra spaces and we do not want to care. This way, it is always better to write some serializing function that works exactly we need.

Testing Equivalence of xml.etree.ElementTree

Tags:

python

python-3.x

elementtree

oneporter

People also ask

2 Answers

Itamar

lenz

Recent Activity

Donate For Us

Testing Equivalence of xml.etree.ElementTree

Tags:

python

python-3.x

elementtree

oneporter

People also ask

2 Answers

Itamar

lenz

Related questions

Recent Activity

Donate For Us