I have a xml file as follows
<Person>
<name>
My Name
</name>
<Address>My Address</Address>
</Person>
The tag has extra new lines, Is there any quick Pythonic way to trim this and generate a new xml.
I found this but it trims only which are between tags not the value https://skyl.org/log/post/skyl/2010/04/remove-insignificant-whitespace-from-xml-string-with-python/
Update 1 - Handle following xml which has tail spaces in <name>
tag
<Person>
<name>
My Name<shortname>My</short>
</name>
<Address>My Address</Address>
</Person>
Accepted answer handle above both kind of xml's
Update 2 - I have posted my version in answer below, I am using it to remove all kind of whitespaces and generate pretty xml in file with xml encodings
https://stackoverflow.com/a/19396130/973699
With lxml
you can iterate over all elements and check if it has text to strip()
:
from lxml import etree
tree = etree.parse('xmlfile')
root = tree.getroot()
for elem in root.iter('*'):
if elem.text is not None:
elem.text = elem.text.strip()
print(etree.tostring(root))
It yields:
<Person><name>My Name</name>
<Address>My Address</Address>
</Person>
UPDATE to strip tail
text too:
from lxml import etree
tree = etree.parse('xmlfile')
root = tree.getroot()
for elem in root.iter('*'):
if elem.text is not None:
elem.text = elem.text.strip()
if elem.tail is not None:
elem.tail = elem.tail.strip()
print(etree.tostring(root, encoding="utf-8", xml_declaration=True))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With