Our project gets from upstream XML of this form:
<?xml version="1.0" encoding="utf-8"?>
<configuration>
<runtime>
<assemblyBinding xmlns="urn:schemas-microsoft-com:asm.v1">
<dependentAssembly>
<assemblyIdentity name="Newtonsoft.Json" publicKeyToken="30ad4fe6b2a6aeed" culture="neutral" />
<bindingRedirect oldVersion="0.0.0.0-6.0.0.0" newVersion="7.0.0.0" />
</dependentAssembly>
</assemblyBinding>
</runtime>
<appSettings>
<add key="foo" value="default">
...
</appSettings>
</configuration>
It then reads/parses this XML using ElementTree, and then for every app setting matching a certain key ("foo"), it writes a new value that it knows about that the upstream process doesn't ( in this case key "foo" should have the value "bar").
The downstream process consuming the filtered XML is, aaahhhh... fragile. It expects to receive the XML in exactly the form above.
If I parse this XML without registering a namespace, then ElementTree mangles my tree like this on input:
<configuration xmlns:ns0="urn:schemas-microsoft-com:asm.v1">
<runtime>
<ns0:assemblyBinding>
<ns0:dependentAssembly>
<ns0:assemblyIdentity culture="neutral" name="Newtonsoft.Json" publicKeyToken="30ad4fe6b2a6aeed" />
<ns0:bindingRedirect newVersion="7.0.0.0" oldVersion="0.0.0.0-6.0.0.0" />
</ns0:dependentAssembly>
</ns0:assemblyBinding>
</runtime>
<appSettings>
<add key="foo" value="default">
...
</appSettings>
</configuration>
The downstream process can't handle this, because it's no clever enough to realize that, semantically, this is the same thing. So, I decide to register the namespace I know the upstream process will provide as a default namespace to avoid the prefixes showing up everywhere, and now I get this:
<configuration xmlns="urn:schemas-microsoft-com:asm.v1">
<runtime>
<assemblyBinding>
<dependentAssembly>
<assemblyIdentity culture="neutral" name="Newtonsoft.Json" publicKeyToken="30ad4fe6b2a6aeed" />
<bindingRedirect newVersion="7.0.0.0" oldVersion="0.0.0.0-6.0.0.0" />
</dependentAssembly>
</assemblyBinding>
</runtime>
<appSettings>
<add key="foo" value="default">
...
</appSettings>
</configuration>
I don't know much about XML, but this also the downstream component cries about, and it seems to me that doesn't now mean this default xmlns
now apply to all included elements inside <configuration>
, whereas before it only applied to the <assemblyBinding>
element?
Is there anyway, using ElementTree, to handle this namespace so that I can take in the upstream's XML, set foo
's value, and then pass that on downstream, without moving the namespace around, and leaving it exactly as I found it?
I could use an lxml-based solution, which seems to handle this, however, lxml has a dependency on C which the downstream component would really like not to have to support: a pure Python solution is preferable.
I could read the document as HTML which would ignore the namespace attribute, let me manipulate the value I want, and then pass on the document; however, I have yet to find a Python parser that doesn't downcase all the element names, and my downstream component requires the casing on all element names to be preserved.
I could resort to string parsing and regular expressions. I would rather not write my own parser.
The only advice I could find so far about namespace handling in ElementTree suggests the "register a default namespace to avoid prefixes" approach, which I assumed would be suitable, but ElementTree then insists on moving the xmlns
declaration up to the root node upon dumping.
I could also be clever build up a string that dumps the tree out in stages and in exactly the right order to put the xmlns
declaration back on the "right node", but that strikes me, also, as pretty darned fragile.
Has anyone managed to get past a problem like this?
Python allows parsing these XML documents using two modules namely, the xml. etree. ElementTree module and Minidom (Minimal DOM Implementation).
ElementTree is an important Python library that allows you to parse and navigate an XML document. Using ElementTree breaks down the XML document in a tree structure that is easy to work with.
As far as I know the solution that better suits your needs is to write a pure Python custom rendering using the features exposed by xml.etree.ElementTree
. Here is one possible solution:
from xml.etree import ElementTree as ET
from re import findall, sub
def render(root, buffer='', namespaces=None, level=0, indent_size=2, encoding='utf-8'):
buffer += f'<?xml version="1.0" encoding="{encoding}" ?>\n' if not level else ''
root = root.getroot() if isinstance(root, ET.ElementTree) else root
_, namespaces = ET._namespaces(root) if not level else (None, namespaces)
for element in root.iter():
indent = ' ' * indent_size * level
tag = sub(r'({[^}]+}\s*)*', '', element.tag)
buffer += f'{indent}<{tag}'
for ns in findall(r'{[^}]+}', element.tag):
ns_key = ns[1:-1]
if ns_key not in namespaces: continue
buffer += ' xmlns' + (f':{namespaces[ns_key]}' if namespaces[ns_key] != '' else '') + f'="{ns_key}"'
del namespaces[ns_key]
for k, v in element.attrib.items():
buffer += f' {k}="{v}"'
buffer += '>' + element.text.strip() if element.text else '>'
children = list(element)
for child in children:
sep = '\n' if buffer[-1] != '\n' else ''
buffer += sep + render(child, level=level+1, indent_size=indent_size, namespaces=namespaces)
buffer += f'{indent}</{tag}>\n' if 0 != len(children) else f'</{tag}>\n'
return buffer
By issuing theXML
data you gave, to the above render
function as show below:
data=\
'''<?xml version="1.0" encoding="utf-8"?>
<configuration>
<runtime>
<assemblyBinding xmlns="urn:schemas-microsoft-com:asm.v1">
<dependentAssembly>
<assemblyIdentity name="Newtonsoft.Json" publicKeyToken="30ad4fe6b2a6aeed" culture="neutral" />
<bindingRedirect oldVersion="0.0.0.0-6.0.0.0" newVersion="7.0.0.0" />
</dependentAssembly>
</assemblyBinding>
</runtime>
<appSettings>
<add key="foo" value="default" />
</appSettings>
</configuration>'''
e = ET.fromstring(data)
ET.register_namespace('', "urn:schemas-microsoft-com:asm.v1")
r = ET.ElementTree(e)
You'll get the following resulting XML
having the properties you stated you are looking for:
<?xml version="1.0" encoding="utf-8" ?>
<configuration>
<runtime>
<assemblyBinding xmlns="urn:schemas-microsoft-com:asm.v1">
<dependentAssembly>
<assemblyIdentity name="Newtonsoft.Json" publicKeyToken="30ad4fe6b2a6aeed" culture="neutral"></assemblyIdentity>
<bindingRedirect oldVersion="0.0.0.0-6.0.0.0" newVersion="7.0.0.0"></bindingRedirect>
</dependentAssembly>
</assemblyBinding>
</runtime>
<appSettings>
<add key="foo" value="default"></add>
</appSettings>
</configuration>
I know I came late to the party.. Anyway hoping this will help you and many other having the same issue, here it is a good solution. Happy coding!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With