Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

xsltproc doesn't select elements by name

Tags:

xslt

I am trying to transform XHTML using an XSLT stylesheet, but I can't even get a basic stylesheet to match anything. I'm sure I'm missing something simple.

Here's my XHTML source document (no big surprises):

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta name="generator" content=
"HTML Tidy for Windows (vers 25 March 2009), see www.w3.org" />
...
</body>
</html>

The actual contents don't matter too much, as I'll demonstrate below. By the way, I'm pretty sure the document is well-formed since it was created via tidy -asxml.

My more complex XPath expressions were not returning any results, so as a sanity test, I'm trying to transform it very simply using the following stylesheet:

<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output method="text" omit-xml-declaration="yes" indent="no"/>
    <xsl:template match="/">
        <xsl:text>---[</xsl:text>
        <xsl:for-each select="html">
            <xsl:text>Found HTML element.</xsl:text>
        </xsl:for-each>
        <xsl:text>]---</xsl:text>
    </xsl:template>
</xsl:stylesheet>

The transform is done via xsltproc --nonet stylesheet.xsl input.html, and the output is: "---[]---" (i.e., it didn't find a child element of html). However, if I change the for-each section to:

<xsl:for-each select="*">
    <xsl:value-of select="name()"/>
</xsl:for-each>

Then I get "---[html]---". And similarly, if I use for-each select="*/*" I get "---[headbody]---" as I would expect.

Why can it find the child element via * (with name() giving the correct name) but it won't find it using the element name directly?

like image 907
Tadmas Avatar asked Oct 17 '10 18:10

Tadmas


3 Answers

The html element in your source XML defines a namespace. You have to include it in your match expression and reference it in your xsl:stylesheet element:

<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:html="http://www.w3.org/1999/xhtml">
    <xsl:output method="text" omit-xml-declaration="yes" indent="no"/>
    <xsl:template match="/">
        <xsl:text>---[</xsl:text>
        <xsl:for-each select="html:html">
            <xsl:text>Found HTML element.</xsl:text>
        </xsl:for-each>
        <xsl:text>]---</xsl:text>
    </xsl:template>
</xsl:stylesheet>
like image 72
Frédéric Hamidi Avatar answered Nov 14 '22 02:11

Frédéric Hamidi


Change your stylesheet from:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> 
    <xsl:output method="text" omit-xml-declaration="yes" indent="no"/> 
    <xsl:template match="/"> 
        <xsl:text>---[</xsl:text> 
        <xsl:for-each select="html"> 
            <xsl:text>Found HTML element.</xsl:text> 
        </xsl:for-each> 
        <xsl:text>]---</xsl:text> 
    </xsl:template> 
</xsl:stylesheet> 

to:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
 xmlns:x="http://www.w3.org/1999/xhtml"
> 
    <xsl:output method="text" omit-xml-declaration="yes" indent="no"/> 
    <xsl:template match="/"> 
        <xsl:text>---[</xsl:text> 
        <xsl:for-each select="x:html"> 
            <xsl:text>Found HTML element.</xsl:text> 
        </xsl:for-each> 
        <xsl:text>]---</xsl:text> 
    </xsl:template> 
</xsl:stylesheet> 

Explanation:

The XML document has declared a default namespace: "http://www.w3.org/1999/xhtml", and all unprefixed nodes that descend from the top element declaring this default namespace, belong to this namespace.

On the other side, in XPath any unprefixed name is considered to belong in "no namespace".

Therefore, the <xsl:for-each select="html"> instruction will select and apply its body to all html elements that belong to "no namespace" -- and there are none such in the document -- the only html element does belong to the xhtml namespace.

Solution:

The the names that belong to a default namespace cannot be referenced unprefixed. Therefore, we need to bind a prefix to the namespace such an element belongs to. If this prefix is "x:", then we can reference any such element prefixed with "x:".

like image 4
Dimitre Novatchev Avatar answered Nov 14 '22 02:11

Dimitre Novatchev


A workaround without declaring the namespace, so that the stylesheet accept any namespace:

<xsl:template match="*[name()='html']" >
like image 1
Jarekczek Avatar answered Nov 14 '22 03:11

Jarekczek