When rendering XHTML with lxml, everything is fine, unless you happen to use Firefox, which seems unable to deal with namespace-prefixed XHTML elements and javascript. While Opera is able to execute the javascript (this applies to both jQuery and MathJax) fine, no matter whether the XHTML namespace has a prefix (h: in my case) or not, in Firefox the scripts will abort with weird errors (this.head is undefined in the case of MathJax).
I know about the register_namespace function, but it does neither accept None nor "" as namespace prefix. I've heard about _namespace_map in the lxml.etree module, but my Python complains that this attribute doesn't exist (version issues?)
Is there any other way removing the namespace prefix for the XHTML namespace? Note that str.replace, as suggested in the answer to another, related question, is not a method I could accept, as it is not aware of XML semantics and might easily screw up the resulting document.
As per request, you'll find two examples ready to use. One with namespace prefixes and one without. The first one will display 0 in Firefox (wrong) and the second one will display 1 (correct). Opera will render both correct. This is obviously a Firefox bug, but this only serves as a rationale for wanting prefixless XHTML with lxml – there are other good reasons as to reduce traffic for mobile clients etc (even h: is quite a lot if you consider tens or hundret of html tags).
Use ElementMaker and give it an nsmap that maps None to your default namespace.
#!/usr/bin/env python
# dogeml.py
from lxml.builder import ElementMaker
from lxml import etree
E = ElementMaker(
nsmap={
None: "http://wow/" # <--- This is the special sauce
}
)
doge = E.doge(
E.such('markup'),
E.many('very namespaced', syntax="tricks")
)
options = {
'pretty_print': True,
'xml_declaration': True,
'encoding': 'UTF-8',
}
serialized_bytes = etree.tostring(doge, **options)
print(serialized_bytes.decode(options['encoding']))
As you can see in the output from this script, the default namespace is defined, but the tags do not have a prefix.
<?xml version='1.0' encoding='UTF-8'?>
<doge xmlns="http://wow/">
<such>markup</such>
<many syntax="tricks">very namespaced</many>
</doge>
I have tested this code with Python 2.7.6, 3.3.5, and 3.4.0, combined with lxml 3.3.1.
This XSL transformation removes all prefixes from content, while maintaining namespaces defined in the root node:
import lxml.etree as ET
content = '''\
<?xml version='1.0' encoding='utf-8'?>
<!DOCTYPE html>
<h:html xmlns:h="http://www.w3.org/1999/xhtml" xmlns:ml="http://foo">
<h:head>
<h:title>MathJax Test Page</h:title>
<h:script type="text/javascript"><![CDATA[
function test() {
alert(document.getElementsByTagName("p").length);
};
]]></h:script>
</h:head>
<h:body onload="test();">
<h:p>test</h:p>
<ml:foo></ml:foo>
</h:body>
</h:html>
'''
dom = ET.fromstring(content)
xslt = '''\
<xsl:stylesheet version="1.0"
xmlns="http://www.w3.org/1999/xhtml"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="no"/>
<!-- identity transform for everything else -->
<xsl:template match="/|comment()|processing-instruction()|*|@*">
<xsl:copy>
<xsl:apply-templates />
</xsl:copy>
</xsl:template>
<!-- remove NS from XHTML elements -->
<xsl:template match="*[namespace-uri() = 'http://www.w3.org/1999/xhtml']">
<xsl:element name="{local-name()}">
<xsl:apply-templates select="@*|node()" />
</xsl:element>
</xsl:template>
<!-- remove NS from XHTML attributes -->
<xsl:template match="@*[namespace-uri() = 'http://www.w3.org/1999/xhtml']">
<xsl:attribute name="{local-name()}">
<xsl:value-of select="." />
</xsl:attribute>
</xsl:template>
</xsl:stylesheet>
'''
xslt_doc = ET.fromstring(xslt)
transform = ET.XSLT(xslt_doc)
dom = transform(dom)
print(ET.tostring(dom, pretty_print = True,
encoding = 'utf-8'))
yields
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>MathJax Test Page</title>
<script type="text/javascript">
function test() {
alert(document.getElementsByTagName("p").length);
};
</script>
</head>
<body onload="test();">
<p>test</p>
<ml:foo xmlns:ml="http://foo"/>
</body>
</html>
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With