I have an XML file in the following format <pre class="prettyprint"><code><?xml version="1.0" encoding="utf-8"?> <foo> <bar> <bat>1</bat> </bar> <a> <c>1</c> </a> </foo> </code></pre> I want to change the value of bat to '2' and change the file to this: <pre class="prettyprint"><code><?xml version="1.0" encoding="utf-8"?> <foo> <bar> <bat>2</bat> </bar> <a> <c>1</c> </a> </foo> </code></pre> I open this file by doing this <pre class="prettyprint"><code>tree = ET.parse(filePath) root = tree.getroot() </code></pre> I then change the value of bat to '2' and save the file like this: <pre class="prettyprint"><code>tree.write(filePath, "utf-8", True, None, "xml") </code></pre> The value of bat successfully changes to 2, but the XML file now looks like this. <pre class="prettyprint"><code><?xml version="1.0" encoding="utf-8"?> <foo xmlns:ns0="urn:schemas-microsoft-com:asm.v1"> <bar> <bat>2</bat> </bar> <a> <ns0:b> <ns0:c>1</ns0:c> </ns0:b> </a> </foo> </code></pre> In order to fix the issue of having a namespace named ns0, I do the following before parsing the document <pre class="prettyprint"><code>ET.register_namespace('', "urn:schemas-microsoft-com:asm.v1") </code></pre> This gets rid of the ns0 namepace but the xml file now looks like this <pre class="prettyprint"><code><?xml version="1.0" encoding="utf-8"?> <foo xmlns="urn:schemas-microsoft-com:asm.v1"> <bar> <bat>2</bat> </bar> <a> <c>1</c> </a> </foo> </code></pre> What do I do to get the output I need?

Using package <code>lxml</code> can helps solve your problem. An example with original/modified xml file and python code (using <code>lxml</code>) package, with the namespace/xml structure unchanged, has been provided here: example with namespace/xml structure unchanged

Keep Existing Namespaces when overwriting XML file with ElementTree and Python

Tags:

python

xml

elementtree

I have an XML file in the following format

<?xml version="1.0" encoding="utf-8"?>
<foo>
   <bar>
      <bat>1</bat>
   </bar>
   <a>
      <b xmlns="urn:schemas-microsoft-com:asm.v1">
         <c>1</c>
      </b>
   </a>
</foo>

I want to change the value of bat to '2' and change the file to this:

<?xml version="1.0" encoding="utf-8"?>
<foo>
   <bar>
      <bat>2</bat>
   </bar>
   <a>
      <b xmlns="urn:schemas-microsoft-com:asm.v1">
         <c>1</c>
      </b>
   </a>
</foo>

I open this file by doing this

tree = ET.parse(filePath)
root = tree.getroot()

I then change the value of bat to '2' and save the file like this:

tree.write(filePath, "utf-8", True, None, "xml")

The value of bat successfully changes to 2, but the XML file now looks like this.

<?xml version="1.0" encoding="utf-8"?>
<foo xmlns:ns0="urn:schemas-microsoft-com:asm.v1">
   <bar>
      <bat>2</bat>
   </bar>
   <a>
      <ns0:b>
         <ns0:c>1</ns0:c>
      </ns0:b>
   </a>
</foo>

In order to fix the issue of having a namespace named ns0, I do the following before parsing the document

ET.register_namespace('', "urn:schemas-microsoft-com:asm.v1")

This gets rid of the ns0 namepace but the xml file now looks like this

<?xml version="1.0" encoding="utf-8"?>
<foo xmlns="urn:schemas-microsoft-com:asm.v1">
   <bar>
      <bat>2</bat>
   </bar>
   <a>
      <b>
         <c>1</c>
      </b>
   </a>
</foo>

What do I do to get the output I need?

861

asked Jul 29 '16 16:07

Sachin Kainth

2 Answers

As far as i know there isn't a way by the means of xml.etree.ElementTree methods to achieve your goal. By digging in the xml.etree source code and the xml specification I found that the library behaviour is not wrong, nor unreasonable. Anyway it does not allows the output you are looking for.

To achieve your goal using that library you have to customize rendering behaviour. To best suite your needs I have written the following render function.

from xml.etree import ElementTree as ET
from re import findall, sub

def render(root, buffer='', namespaces=None, level=0, indent_size=2, encoding='utf-8'):
    buffer += f'<?xml version="1.0" encoding="{encoding}" ?>\n' if not level else ''
    root = root.getroot() if isinstance(root, ET.ElementTree) else root
    _, namespaces = ET._namespaces(root) if not level else (None, namespaces)
    for element in root.iter():
        indent = ' ' * indent_size * level
        tag = sub(r'({[^}]+}\s*)*', '', element.tag)
        buffer += f'{indent}<{tag}'
        for ns in findall(r'{[^}]+}', element.tag):
            ns_key = ns[1:-1]
            if ns_key not in namespaces: continue
            buffer += ' xmlns' + (f':{namespaces[ns_key]}' if namespaces[ns_key] != '' else '') + f'="{ns_key}"'
            del namespaces[ns_key]
        for k, v in element.attrib.items():
            buffer += f' {k}="{v}"'
        buffer += '>' + element.text.strip() if element.text else '>'
        children = list(element)
        for child in children:
            sep = '\n' if buffer[-1] != '\n' else ''
            buffer += sep + render(child, level=level+1, indent_size=indent_size, namespaces=namespaces)
        buffer += f'{indent}</{tag}>\n' if 0 != len(children) else f'</{tag}>\n'
    return buffer

By supplying to the above render() function your xml input data as follows:

data =\ 
'''<?xml version="1.0" encoding="utf-8"?>
<foo>
   <bar>
      <bat>1</bat>
   </bar>
   <a>
      <b xmlns="urn:schemas-microsoft-com:asm.v1">
         <c>1</c>
      </b>
   </a>
</foo>'''

root = ET.ElementTree(ET.fromstring(data))
ET.register_namespace('', "urn:schemas-microsoft-com:asm.v1")
print(render(root))

It prints out the output your are looking for:

<?xml version="1.0" encoding="utf-8" ?>
<foo>
  <bar>
    <bat>1</bat>
  </bar>
  <a>
    <b xmlns="urn:schemas-microsoft-com:asm.v1">
      <c>1</c>
    </b>
  </a>
</foo>

answered Oct 19 '22 11:10

Giova

Using package lxml can helps solve your problem. An example with original/modified xml file and python code (using lxml) package, with the namespace/xml structure unchanged, has been provided here: example with namespace/xml structure unchanged

answered Oct 19 '22 13:10

XYZ

Related questions
                            
                                Implementing seq2seq with beam search
                            
                                How can I create an AI for tic tac toe in Python using ANN and genetic algorithm?
                            
                                Using django-filer, can I chose the folder that images go into, from 'Unsorted Uploads'
                            
                                Why do I get "GurobiError: Variable not in model" after using Model.copy()?
                            
                                how to click on the link using python selenium?
                            
                                Docker / Celery: Can't get celery to run
                            
                                How do i use Linux terminal commands like CD and LS? [duplicate]
                            
                                Saving a collection of variable length tensors to a TFRecords file in TensorFlow
                            
                                NLTK - Download all nltk data except corpara from command line without Downloader UI
                            
                                difflib.SequenceMatcher isjunk argument not considered?
                            
                                Python: List algebraic simplification
                            
                                How I can use Kivy (Python) camera
                            
                                Turtlebot subscriber pointcloud2 shows color in Gazebo simulator but not in robot
                            
                                How can a portal user modify his own partner data in Odoo 8?
                            
                                Kaitai Struct: calculated instances with a condition
                            
                                How to implement element-wise 1D interpolation in Tensorflow?
                            
                                'Reversed' comparison operator in Python
                            
                                DRF AttributeError: 'int' object has no attribute 'pk'
                            
                                How to use the link grammar parser as a grammar checker
                            
                                How to handle clicks on Links in Python with Gtk 3.0 and WebKit2 4.0?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With