Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using an XML catalog with Python's lxml?

Tags:

python

xml

lxml

Is there a way, when I parse an XML document using lxml, to validate that document against its DTD using an external catalog file? I need to be able to work the fixed attributes defined in a document’s DTD.

like image 570
James Sulak Avatar asked Aug 15 '08 18:08

James Sulak


People also ask

Is XML and lxml are same?

lxml is a Python library which allows for easy handling of XML and HTML files, and can also be used for web scraping. There are a lot of off-the-shelf XML parsers out there, but for better results, developers sometimes prefer to write their own XML and HTML parsers.

Can you use XML with Python?

The Python standard library provides a minimal but useful set of interfaces to work with XML. The two most basic and broadly used APIs to XML data are the SAX and DOM interfaces. Simple API for XML (SAX) − Here, you register callbacks for events of interest and then let the parser proceed through the document.


2 Answers

You can add the catalog to the XML_CATALOG_FILES environment variable:

os.environ['XML_CATALOG_FILES'] = 'file:///to/my/catalog.xml'

See this thread. Note that entries in XML_CATALOG_FILES are space-separated URLs. You can use Python's pathname2url and urljoin (with file:) to generate the URL from a pathname.

like image 58
Brecht Machiels Avatar answered Oct 01 '22 23:10

Brecht Machiels


Can you give an example? According to the lxml validation docs, lxml can handle DTD validation (specified in the XML doc or externally in code) and system catalogs, which covers most cases I can think of.

f = StringIO("<!ELEMENT b EMPTY>")
dtd = etree.DTD(f)
dtd = etree.DTD(external_id = "-//OASIS//DTD DocBook XML V4.2//EN")
like image 27
Michael Twomey Avatar answered Oct 02 '22 00:10

Michael Twomey