Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

xml.etree.ElementTree.ParseError -- exception handling not catching errors

Tags:

I'm trying to parse an xml document that has a number of undefined entities that cause a ParseError when I try to run my code, which is as follows:

import xml.etree.ElementTree as ET

tree = ET.parse('cic.fam_lat.xml')
root = tree.getroot()

while True:
    try:
        for name in root.iter('name'):
            print(root.tag, name.text)
    except xml.etree.ElementTree.ParseError:
        pass

for name in root.iter('name'):
    print(name.text)

An example of said error is as follows, and there are a number of undefined entities that will all throw the same error: error description

I just want to ignore them rather than go in and edit out each one. How should I edit my exception handling to catch these error instances? (i.e., what am I doing wrong?)

like image 379
Daniel Avatar asked Dec 21 '17 04:12

Daniel


People also ask

What is xml etree ElementTree?

The xml.etree.ElementTree module implements a simple and efficient API for parsing and creating XML data. Changed in version 3.3: This module will use a fast implementation whenever available.


1 Answers

There are some workarounds, like defining custom entities, suggested at:

  • Python ElementTree support for parsing unknown XML entities?

But, if you are able to switch to lxml, its XMLParser() can work in the "recover" mode that would "ignore" the undefined entities:

import lxml.etree as ET

parser = ET.XMLParser(recover=True)
tree = ET.parse('cic.fam_lat.xml', parser=parser)

for name in root.iter('name'):
    print(root.tag, name.text)

(worked for me - got the tag names and texts printed)

like image 112
alecxe Avatar answered Oct 13 '22 04:10

alecxe