Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to preserve namespaces when parsing xml via ElementTree in Python

Assume that I've the following XML which I want to modify using Python's ElementTree:

<root xmlns:prefix="URI">
  <child company:name="***"/>
  ...
</root> 

I'm doing some modification on the XML file like this:

import xml.etree.ElementTree as ET
tree = ET.parse('filename.xml')
# XML modification here
# save the modifications
tree.write('filename.xml')

Then the XML file looks like:

<root xmlns:ns0="URI">
  <child ns0:name="***"/>
  ...
</root>

As you can see, the namepsace prefix changed to ns0. I'm aware of using ET.register_namespace() as mentioned here.

The problem with ET.register_namespace() is that:

  1. You need to know prefix and URI
  2. It can not be used with default namespace.

e.g. If the xml looks like:

<root xmlns="http://uri">
    <child name="name">
    ...
    </child>
</root>

It will be transfomed to something like:

<ns0:root xmlns:ns0="http://uri">
    <ns0:child name="name">
    ...
    </ns0:child>
</ns0:root>

As you can see, the default namespace is changed to ns0.

Is there any way to solve this problem with ElementTree?

like image 412
amrezzd Avatar asked Jan 30 '19 11:01

amrezzd


People also ask

Which Python module is best suited for parsing XML documents?

Python XML Parsing Modules Python allows parsing these XML documents using two modules namely, the xml. etree. ElementTree module and Minidom (Minimal DOM Implementation).

What is Namespace in XML file?

An XML namespace is a collection of names that can be used as element or attribute names in an XML document. The namespace qualifies element names uniquely on the Web in order to avoid conflicts between elements with the same name.

What does Etree parse do?

Parsing from strings and files. lxml. etree supports parsing XML in a number of ways and from all important sources, namely strings, files, URLs (http/ftp) and file-like objects. The main parse functions are fromstring() and parse(), both called with the source as first argument.


1 Answers

ElementTree will replace those namespaces' prefixes that are not registered with ET.register_namespace. To preserve a namespace prefix, you need to register it first before writing your modifications on a file. The following method does the job and registers all namespaces globally,

def register_all_namespaces(filename):
    namespaces = dict([node for _, node in ET.iterparse(filename, events=['start-ns'])])
    for ns in namespaces:
        ET.register_namespace(ns, namespaces[ns])

This method should be called before ET.parse method, so that the namespaces will remain as unchanged,

import xml.etree.ElementTree as ET
register_all_namespaces('filename.xml')
tree = ET.parse('filename.xml')
# XML modification here
# save the modifications
tree.write('filename.xml')
like image 186
amrezzd Avatar answered Sep 20 '22 12:09

amrezzd