Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to render a doctype with Python's xml.dom.minidom?

I tried:

document.doctype = xml.dom.minidom.DocumentType('html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "DTD/xhtml1-strict.dtd"')

There is no doctype in the output. How to fix without inserting it by hand?

like image 938
myfreeweb Avatar asked Dec 30 '09 14:12

myfreeweb


People also ask

What is XML DOM Minidom?

xml. dom. minidom is a minimal implementation of the Document Object Model interface, with an API similar to that in other languages. It is intended to be simpler than the full DOM and also significantly smaller.

What is XML DOM document?

The XML Document Object Model (DOM) class is an in-memory representation of an XML document. The DOM allows you to programmatically read, manipulate, and modify an XML document. The XmlReader class also reads XML; however, it provides non-cached, forward-only, read-only access.

What is DOM element in Python?

The Document Object Model, or “DOM,” is a cross-language API from the World Wide Web Consortium (W3C) for accessing and modifying XML documents. A DOM implementation presents an XML document as a tree structure, or allows client code to build such a structure from scratch.


1 Answers

You shouldn't instantiate classes from minidom directly. It's not a supported part of the API, the ownerDocument​s won't tie up and you can get some strange misbehaviours. Instead use the proper DOM Level 2 Core methods:

>>> imp= minidom.getDOMImplementation('')
>>> dt= imp.createDocumentType('html', '-//W3C//DTD XHTML 1.0 Strict//EN', 'http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd')

(‘DTD/xhtml1-strict.dtd’ is a commonly-used but wrong SystemId. That relative URL would only be valid inside the xhtml1 folder at w3.org.)

Now you've got a DocumentType node, you can add it to a document. According to the standard, the only guaranteed way of doing this is at document creation time:

>>> doc= imp.createDocument('http://www.w3.org/1999/xhtml', 'html', dt)
>>> print doc.toxml()
<?xml version="1.0" ?><!DOCTYPE html  PUBLIC '-//W3C//DTD XHTML 1.0 Strict//EN'  'http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd'><html/>

If you want to change the doctype of an existing document, that's more trouble. The DOM standard doesn't require that DocumentType nodes with no ownerDocument be insertable into a document. However some DOMs allow it, eg. pxdom. minidom kind of allows it:

>>> doc= minidom.parseString('<html xmlns="http://www.w3.org/1999/xhtml"><head/><body/></html>')
>>> dt= minidom.getDOMImplementation('').createDocumentType('html', '-//W3C//DTD XHTML 1.0 Strict//EN', 'http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd')
>>> doc.insertBefore(dt, doc.documentElement)
<xml.dom.minidom.DocumentType instance>
>>> print doc.toxml()
<?xml version="1.0" ?><!DOCTYPE html  PUBLIC '-//W3C//DTD XHTML 1.0 Strict//EN'  'http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd'><html xmlns="http://www.w3.org/1999/xhtml"><head/><body/></html>

but with bugs:

>>> doc.doctype
# None
>>> dt.ownerDocument
# None

which may or may not matter to you.

Technically, the only reliable way per the standard to set a doctype on an existing document is to create a new document and import the whole of the old document into it!

def setDoctype(document, doctype):
    imp= document.implementation
    newdocument= imp.createDocument(doctype.namespaceURI, doctype.name, doctype)
    newdocument.xmlVersion= document.xmlVersion
    refel= newdocument.documentElement
    for child in document.childNodes:
        if child.nodeType==child.ELEMENT_NODE:
            newdocument.replaceChild(
                newdocument.importNode(child, True), newdocument.documentElement
            )
            refel= None
        elif child.nodeType!=child.DOCUMENT_TYPE_NODE:
            newdocument.insertBefore(newdocument.importNode(child, True), refel)
    return newdocument
like image 114
bobince Avatar answered Sep 30 '22 17:09

bobince