I have the following Python code:
import xml.dom.minidom
import xml.parsers.expat
try:
domTree = ml.dom.minidom.parse(myXMLFileName)
except xml.parsers.expat.ExpatError, e:
return e.args[0]
which I am using to parse an XML file. Although it quite happily spots simple XML errors like mismatched tags, it completely ignores the DTD specified at the top of the XML file:
<?xml version="1.0" encoding="UTF-8" standalone="no" ?>
<!DOCTYPE ServerConfig SYSTEM "ServerConfig.dtd">
so it doesn't notice when mandatory elements are missing, for example. How can I switch on DTD checking?
See this question - the accepted answer is to use lxml validation.
Just by way of explanation: Python xml.dom.minidom and xml.sax use the expat parser by default, which is a non-validating parser. It may read the DTD in order to do entity replacement, but it won't validate against the DTD.
gimel and Tim recommend lxml, which is a nicely pythonic binding for the libxml2 and libxslt libraries. It supports validation against a DTD. I've been using lxml, and I like it a lot.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With