Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Keep Existing Namespaces when overwriting XML file with ElementTree and Python

I have an XML file in the following format

<?xml version="1.0" encoding="utf-8"?>
<foo>
   <bar>
      <bat>1</bat>
   </bar>
   <a>
      <b xmlns="urn:schemas-microsoft-com:asm.v1">
         <c>1</c>
      </b>
   </a>
</foo>

I want to change the value of bat to '2' and change the file to this:

<?xml version="1.0" encoding="utf-8"?>
<foo>
   <bar>
      <bat>2</bat>
   </bar>
   <a>
      <b xmlns="urn:schemas-microsoft-com:asm.v1">
         <c>1</c>
      </b>
   </a>
</foo>

I open this file by doing this

tree = ET.parse(filePath)
root = tree.getroot()

I then change the value of bat to '2' and save the file like this:

tree.write(filePath, "utf-8", True, None, "xml")

The value of bat successfully changes to 2, but the XML file now looks like this.

<?xml version="1.0" encoding="utf-8"?>
<foo xmlns:ns0="urn:schemas-microsoft-com:asm.v1">
   <bar>
      <bat>2</bat>
   </bar>
   <a>
      <ns0:b>
         <ns0:c>1</ns0:c>
      </ns0:b>
   </a>
</foo>

In order to fix the issue of having a namespace named ns0, I do the following before parsing the document

ET.register_namespace('', "urn:schemas-microsoft-com:asm.v1")

This gets rid of the ns0 namepace but the xml file now looks like this

<?xml version="1.0" encoding="utf-8"?>
<foo xmlns="urn:schemas-microsoft-com:asm.v1">
   <bar>
      <bat>2</bat>
   </bar>
   <a>
      <b>
         <c>1</c>
      </b>
   </a>
</foo>

What do I do to get the output I need?

like image 861
Sachin Kainth Avatar asked Jul 29 '16 16:07

Sachin Kainth


People also ask

What is XML Etree ElementTree in Python?

The xml.etree.ElementTree module implements a simple and efficient API for parsing and creating XML data. Changed in version 3.3: This module will use a fast implementation whenever available.

Can XML have multiple namespaces?

When you use multiple namespaces in an XML document, you can define one namespace as the default namespace to create a cleaner looking document. The default namespace is declared in the root element and applies to all unqualified elements in the document. Default namespaces apply to elements only, not to attributes.

What is the correct way of declaring and XML namespace?

XML Namespaces - The xmlns Attribute When using prefixes in XML, a namespace for the prefix must be defined. The namespace can be defined by an xmlns attribute in the start tag of an element. The namespace declaration has the following syntax. xmlns:prefix="URI".


2 Answers

As far as i know there isn't a way by the means of xml.etree.ElementTree methods to achieve your goal. By digging in the xml.etree source code and the xml specification I found that the library behaviour is not wrong, nor unreasonable. Anyway it does not allows the output you are looking for.

To achieve your goal using that library you have to customize rendering behaviour. To best suite your needs I have written the following render function.

from xml.etree import ElementTree as ET
from re import findall, sub

def render(root, buffer='', namespaces=None, level=0, indent_size=2, encoding='utf-8'):
    buffer += f'<?xml version="1.0" encoding="{encoding}" ?>\n' if not level else ''
    root = root.getroot() if isinstance(root, ET.ElementTree) else root
    _, namespaces = ET._namespaces(root) if not level else (None, namespaces)
    for element in root.iter():
        indent = ' ' * indent_size * level
        tag = sub(r'({[^}]+}\s*)*', '', element.tag)
        buffer += f'{indent}<{tag}'
        for ns in findall(r'{[^}]+}', element.tag):
            ns_key = ns[1:-1]
            if ns_key not in namespaces: continue
            buffer += ' xmlns' + (f':{namespaces[ns_key]}' if namespaces[ns_key] != '' else '') + f'="{ns_key}"'
            del namespaces[ns_key]
        for k, v in element.attrib.items():
            buffer += f' {k}="{v}"'
        buffer += '>' + element.text.strip() if element.text else '>'
        children = list(element)
        for child in children:
            sep = '\n' if buffer[-1] != '\n' else ''
            buffer += sep + render(child, level=level+1, indent_size=indent_size, namespaces=namespaces)
        buffer += f'{indent}</{tag}>\n' if 0 != len(children) else f'</{tag}>\n'
    return buffer

By supplying to the above render() function your xml input data as follows:

data =\ 
'''<?xml version="1.0" encoding="utf-8"?>
<foo>
   <bar>
      <bat>1</bat>
   </bar>
   <a>
      <b xmlns="urn:schemas-microsoft-com:asm.v1">
         <c>1</c>
      </b>
   </a>
</foo>'''

root = ET.ElementTree(ET.fromstring(data))
ET.register_namespace('', "urn:schemas-microsoft-com:asm.v1")
print(render(root))

It prints out the output your are looking for:

<?xml version="1.0" encoding="utf-8" ?>
<foo>
  <bar>
    <bat>1</bat>
  </bar>
  <a>
    <b xmlns="urn:schemas-microsoft-com:asm.v1">
      <c>1</c>
    </b>
  </a>
</foo>
like image 51
Giova Avatar answered Oct 19 '22 11:10

Giova


Using package lxml can helps solve your problem. An example with original/modified xml file and python code (using lxml) package, with the namespace/xml structure unchanged, has been provided here: example with namespace/xml structure unchanged

like image 24
XYZ Avatar answered Oct 19 '22 13:10

XYZ