Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Don't convert > to > in XSLT

Tags:

xslt

I have some XML that looks like

<?xml version="1.0"?>
<root>
    <![CDATA[
    > foo 
    ]]>
</root>

(Note the > sign in "> foo") and an XSLT stylesheet

<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/root">
    <foo><xsl:value-of select='.'/></foo>
</xsl:template>
</xsl:stylesheet>

When I run xsltproc stylesheet.xsl data.xml I get

<?xml version="1.0"?>
<foo>

    &gt; foo

</foo>

but the output I want is

<?xml version="1.0"?>
<foo>

    > foo

</foo>

i.e. keep the ">" as it is instead of converting it to an entity. How can I accomplish this?

like image 420
pafcu Avatar asked Nov 22 '10 17:11

pafcu


3 Answers

@Oded, @khachik,

Try checking his desired output for well-formedness. It is indeed well-formed XML. ("Valid" is not even a question here, as there is no schema.)

It is a common misconception that ">" is not legal in well-formed XML. In most contexts, "<" is not legal, but ">" is legal everywhere with one rare exception. The relevant paragraph of the spec:

The ampersand character (&) and the left angle bracket (<) MUST NOT appear in their literal form, except when used as markup delimiters, or within a comment, a processing instruction, or a CDATA section. If they are needed elsewhere, they MUST be escaped using either numeric character references or the strings " & " and " < " respectively. The right angle bracket (>) may be represented using the string " &gt; ", and MUST, for compatibility, be escaped using either " &gt; " or a character reference when it appears in the string " ]]> " in content, when that string is not marking the end of a CDATA section.

With XSLT 2.0, the "right" way to do what you want is to use <xsl:character-map>. With XSLT 1.0, I think the only way to force the use of ">" in the output is to use disable-output-escaping, as @khachik suggested. Note however that XSLT processors are not required to honor DOE or character maps, and some can't (e.g. if they're in a pipeline and are not connected to serialization). But you probably know by now whether yours can, and if it can't, you'll need to handle serialization issues at the end of the pipeline.

However, it is worth asking, why do you want the ">" serialized as ">"? As seen in the spec, &gt; is a perfectly acceptable way to express exactly the same information as far as XML is concerned. No downstream XML consumer should know the difference or care. Do you want it for aesthetic reasons?

Update: the OP wants that because the output needs to be not only well-formed XML, it also needs to be well-formed Literate Haskell.

like image 126
LarsH Avatar answered Dec 06 '22 08:12

LarsH


Adding to the very good explanation of @LarsH:

If your XSLT processor allows DOE, then you can use:

  <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:template match="/root">
        <foo><xsl:value-of select='.' disable-output-escaping="yes"/></foo>
    </xsl:template>
  </xsl:stylesheet>

and when this transformation is applied on the provided XML document:

<?xml version="1.0"?>
<root>
    <![CDATA[
    > foo
    ]]>
</root>

the wanted output is produced:

<foo>
    > foo
    </foo>
like image 44
Dimitre Novatchev Avatar answered Dec 06 '22 08:12

Dimitre Novatchev


<xsl:value-of select='.' disable-output-escaping="yes"/> but it wouldn't be well-formed XML.

Update With > it will be well formed. (With < it won't.)

like image 25
khachik Avatar answered Dec 06 '22 07:12

khachik