Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python how to strip white-spaces from xml text nodes

I have a xml file as follows

<Person>
<name>

 My Name

</name>
<Address>My Address</Address>
</Person>

The tag has extra new lines, Is there any quick Pythonic way to trim this and generate a new xml.

I found this but it trims only which are between tags not the value https://skyl.org/log/post/skyl/2010/04/remove-insignificant-whitespace-from-xml-string-with-python/

Update 1 - Handle following xml which has tail spaces in <name> tag

<Person>
<name>

 My Name<shortname>My</short>

</name>
<Address>My Address</Address>
</Person>

Accepted answer handle above both kind of xml's

Update 2 - I have posted my version in answer below, I am using it to remove all kind of whitespaces and generate pretty xml in file with xml encodings

https://stackoverflow.com/a/19396130/973699

like image 925
DevC Avatar asked Dec 15 '22 05:12

DevC


1 Answers

With lxml you can iterate over all elements and check if it has text to strip():

from lxml import etree

tree = etree.parse('xmlfile')
root = tree.getroot()

for elem in root.iter('*'):
    if elem.text is not None:
        elem.text = elem.text.strip()

print(etree.tostring(root))

It yields:

<Person><name>My Name</name>
<Address>My Address</Address>
</Person>

UPDATE to strip tail text too:

from lxml import etree

tree = etree.parse('xmlfile')
root = tree.getroot()

for elem in root.iter('*'):
    if elem.text is not None:
        elem.text = elem.text.strip()
    if elem.tail is not None:
        elem.tail = elem.tail.strip()

print(etree.tostring(root, encoding="utf-8", xml_declaration=True))
like image 140
Birei Avatar answered Dec 27 '22 04:12

Birei