Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

More compact ElementTree or lxml Namespaces

I am trying to get a compact representation of namespaces in ElementTree or lxml when the sub elements are in a different namespace as the parent. Here is the basic example:

from lxml import etree

country = etree.Element("country")

name = etree.SubElement(country, "{urn:test}name")
name.text = "Canada"
population = etree.SubElement(country, "{urn:test}population")
population.text = "34M"
etree.register_namespace('tst', 'urn:test')

print( etree.tostring(country, pretty_print=True) )

I also tried this approach:

ns = {"test" : "urn:test"}

country = etree.Element("country", nsmap=ns)

name = etree.SubElement(country, "{test}name")
name.text = "Canada"
population = etree.SubElement(country, "{test}population")
population.text = "34M"

print( etree.tostring(country, pretty_print=True) )

In both cases, I get something like this out:

<country>
    <ns0:name xmlns:ns0="urn:test">Canada</ns0:name>
    <ns1:population xmlns:ns1="urn:test">34M</ns1:population>
</country>

While that is correct, I would like it to be less verbose - this can become a real issue with large data sets (and especially because I am using a much larger NS than 'urn:test').

If I am OK with 'country' being inside the "urn:test" namespace and declare it like so (in the first example above):

country = etree.Element("{test}country")

then I get the following output:

<ns0:country xmlns:ns0="urn:test">
    <ns0:name>Canada</ns0:name>
    <ns0:population>34M</ns0:population>
</ns0:country>

But what I really want is this:

<country xmlns:ns0="urn:test">
    <ns0:name>Canada</ns0:name>
    <ns0:population>34M</ns0:population>
<country>

Any ideas?

like image 775
Shane C. Mason Avatar asked Oct 05 '22 10:10

Shane C. Mason


1 Answers

  1. the full name of an element contains of {namespace-url}elementName, not {prefix}elementName

    >>> from lxml import etree as ET
    >>> r = ET.Element('root', nsmap={'tst': 'urn:test'})
    >>> ET.SubElement(r, "{urn:test}child")
    <Element {urn:test}child at 0x2592a80>
    >>> ET.tostring(r)
    '<root xmlns:tst="urn:test"><tst:child/></root>'
    
  2. In your case, even more compact representation might be if you update the default namespace. Unfortunatelly, lxml does not seem to allow empty XML namespace, but you say, you can put the parent tag into the same namespace as child elements, so you can set the dafault namespace to that of child elements:

    >>> r = ET.Element('{urn:test}root', nsmap={None: 'urn:test'})
    >>> ET.SubElement(r, "{urn:test}child")
    <Element {urn:test}child at 0x2592b20>
    >>> ET.SubElement(r, "{urn:test}child")
    <Element {urn:test}child at 0x25928f0>
    >>> ET.tostring(r)
    '<root xmlns="urn:test"><child/><child/></root>'
    
like image 105
newtover Avatar answered Oct 12 '22 11:10

newtover