I'm trying to parse an xml document that has a number of undefined entities that cause a ParseError when I try to run my code, which is as follows:
import xml.etree.ElementTree as ET
tree = ET.parse('cic.fam_lat.xml')
root = tree.getroot()
while True:
try:
for name in root.iter('name'):
print(root.tag, name.text)
except xml.etree.ElementTree.ParseError:
pass
for name in root.iter('name'):
print(name.text)
An example of said error is as follows, and there are a number of undefined entities that will all throw the same error:
I just want to ignore them rather than go in and edit out each one. How should I edit my exception handling to catch these error instances? (i.e., what am I doing wrong?)
The xml.etree.ElementTree module implements a simple and efficient API for parsing and creating XML data. Changed in version 3.3: This module will use a fast implementation whenever available.
There are some workarounds, like defining custom entities, suggested at:
But, if you are able to switch to lxml
, its XMLParser()
can work in the "recover" mode that would "ignore" the undefined entities:
import lxml.etree as ET
parser = ET.XMLParser(recover=True)
tree = ET.parse('cic.fam_lat.xml', parser=parser)
for name in root.iter('name'):
print(root.tag, name.text)
(worked for me - got the tag names and texts printed)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With