from lxml import etree
element1 = etree.Element('{j:a}a', nsmap={None: 'j:a'})
etree.SubElement(element1, 'b')
element2 = etree.Element('{j:a}a', nsmap={None: 'j:a'})
etree.SubElement(element2, '{j:a}b')
both elements serialise to the same
<a xmlns="j:a"><b/></a>
but both elements do not behave the same
element1.find('b')
-> returns the Element
element2.find('b')
-> returns None
if you do it the other way around
etree.fromstring("<a xmlns="j:a"><b/></a>")
you get the representation from element2, so
element2.find('b')
-> returns None
which seems consistent because there is no namespaceless <b/>
in the tree, because <b/>
inherits the default namespace from <a/>
so what's the purpose of the representation in element1? It seems to add a namespaceless subelement <b/>
and behaves that way. But when serialised the element inherits from <a>
.
Why does this exist if it does not serialise anyway?
xml tags can (but must not) have a namespace. So even if the root node defines a default namespace, child nodes are allowed to not have a namespace, which is not equivalent to be in the default namespace.
This is the difference between your element1
and element2
: element1
's subelement has no namespace; element2
's subelement is in the default namespace, since when you create it you specify the default namespace. If you try
element2.find("{j:l}b"))
-> returns the element b
, or to be more accurate, the element {j:a}b
.
So yes, namespace matters. And when you create the elements with lxml, you can define elements without namespace: just don't add it.
Now I am not an lxml expert, so this is just my guess on the point. Thing is when you serialize the element, there is no way to discriminate between elements which are really without namespace and element in the default namespace, so they are represented in the same way.
Consequently, serializing an element and then parsing it again, cannot give the original result. If for example, using your element1
you do:
sel1 = etree.tostring(element1)
element1s = etree.fromstring(sel1)
It turns out that element1s
is not equal to element1
, because the subelement b
now is subelement {j:a}b
. When parsing the string, elements without namespace are added to the default namespace.
Now, I don't know if this is intended or is a bug. At the best of my knowledge, if an XML document declares a default namespace, all elements which do not explicitly have a different namespace should be considered in the default namespace. As it happens when you parse an xml document with the fromstring
function. You can have a "no namespace" only if no default namespace is declared.
So in my opinion your b
subelement of element1
should "inherit" the namespace of the parent node, since parent node defines a default namespace with nsmap={None: "j:a"}
.
But you could also be told that since you are building the document using lxml elements, it's your responsibility to put each element in the correct namespace, which means you have to add the default namespace explicitly.
Since elements without namespaces are allowed by xml under some circustances, lxml does not complain when an element do not have a namespace.
I think that automatic addition of the default namespaces to subelement of elements which declare a default namespace would be a cool feature, but it's just not there.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With