Remove namespace and prefix from xml in python using lxml

Tags:

I have an xml file I need to open and make some changes to, one of those changes is to remove the namespace and prefix and then save to another file. Here is the xml:

<?xml version='1.0' encoding='UTF-8'?> <package xmlns="http://apple.com/itunes/importer">   <provider>some data</provider>   <language>en-GB</language> </package>

I can make the other changes I need, but can't find out how to remove the namespace and prefix. This is the reusklt xml I need:

<?xml version='1.0' encoding='UTF-8'?> <package>   <provider>some data</provider>   <language>en-GB</language> </package>

And here is my script which will open and parse the xml and save it:

metadata = '/Users/user1/Desktop/Python/metadata.xml' from lxml import etree parser = etree.XMLParser(remove_blank_text=True) open(metadata) tree = etree.parse(metadata, parser) root = tree.getroot() tree.write('/Users/user1/Desktop/Python/done.xml', pretty_print = True, xml_declaration = True, encoding = 'UTF-8')

So how would I add code in my script which will remove the namespace and prefix?

599

asked Aug 10 '13 06:08

speedyrazor

2 Answers

We can get the desired output document in two steps:

Remove namespace URIs from element names
Remove unused namespace declarations from the XML tree

Example code

from lxml import etree  input_xml = """ <package xmlns="http://apple.com/itunes/importer">   <provider>some data</provider>   <language>en-GB</language>   <!-- some comment -->   <?xml-some-processing-instruction ?> </package> """ root = etree.fromstring(input_xml)  # Iterate through all XML elements for elem in root.getiterator():     # Skip comments and processing instructions,     # because they do not have names     if not (         isinstance(elem, etree._Comment)         or isinstance(elem, etree._ProcessingInstruction)     ):         # Remove a namespace URI in the element's name         elem.tag = etree.QName(elem).localname  # Remove unused namespace declarations etree.cleanup_namespaces(root)  print(etree.tostring(root).decode())

Output XML

<package>   <provider>some data</provider>   <language>en-GB</language>   <!-- some comment -->   <?xml-some-processing-instruction ?> </package>

Details explaining the code

As described in the documentation, we use lxml.etree.QName.localname to get local names of elements, that is names without namespace URIs. Then we replace the fully qualified names of the elements by their local names.

Some XML elements, such as comments and processing instructions do not have names. So, we have to skip these elements while replacing element names, otherwise a ValueError will be raised.

Finally, we use lxml.etree.cleanup_namespaces() to remove unused namespace declarations from the XML tree.

answered Sep 28 '22 04:09

SergiyKolesnikov

Replace tag as Uku Loskit suggests. In addition to that, use lxml.objectify.deannotate.

from lxml import etree, objectify  metadata = '/Users/user1/Desktop/Python/metadata.xml' parser = etree.XMLParser(remove_blank_text=True) tree = etree.parse(metadata, parser) root = tree.getroot()  ####     for elem in root.getiterator():     if not hasattr(elem.tag, 'find'): continue  # guard for Comment tags     i = elem.tag.find('}')     if i >= 0:         elem.tag = elem.tag[i+1:] objectify.deannotate(root, cleanup_namespaces=True) ####  tree.write('/Users/user1/Desktop/Python/done.xml',            pretty_print=True, xml_declaration=True, encoding='UTF-8')

Note: Some tags like Comment return a function when accessing tag attribute. added a guard for that.

answered Sep 28 '22 05:09

falsetru

Related questions
                            
                                Key Presses in Python
                            
                                How to output list of floats to a binary file in Python
                            
                                How to include third party Python libraries in Google App Engine?
                            
                                Populating a list/array by index in Python?
                            
                                How to generate random 'greenish' colors
                            
                                Help me understand Inorder Traversal without using recursion
                            
                                Unresolved reference: 'django' error in PyCharm
                            
                                Mongoengine creation_time attribute in Document
                            
                                Convert Date String to Day of Week
                            
                                Solution for SpecificationError: nested renamer is not supported while agg() along with groupby()
                            
                                Django templates: create a "back" link?
                            
                                pip-installed uWSGI ./python_plugin.so error
                            
                                When are bisect_left and bisect_right not equal?
                            
                                Command not found: django-admin.py
                            
                                Converting a number to binary with a fixed length
                            
                                Python Prime number checker [duplicate]
                            
                                Find tuple structure containing an unknown value inside a list
                            
                                Scrapy throws ImportError: cannot import name xmlrpc_client
                            
                                Generating all dates within a given range in python
                            
                                Pandas: print column name with missing values

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Remove namespace and prefix from xml in python using lxml

Tags:

python

namespaces

xml

lxml

speedyrazor

People also ask

2 Answers

SergiyKolesnikov

falsetru

Recent Activity

Donate For Us