Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

XSL character escape problem

I am writing this because I have really hit the wall and cannot go ahead. In my database I have escaped HTML like this: "<p>My name is Freddy and I was".

I want to show it as HTML OR strip the HTML tags in my XSL template. Both solutions will work for me and I will choose the quicker solution.

I have read several posts online but cannot find a solution. I have also tried disable-output-escape with no success. Basically it seems the problem is that somewhere in the XSL execution the engine is changing this <p> into this: <p>.

It is converting the & into &. If it helps, here is my XSL code. I have tried several combinations with and without the output tag on the top.

Any help will be appreciated. Thanks in advance.

<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <xsl:output method="html" omit-xml-declaration="yes"/>

  <xsl:template match="DocumentElement">
    <div>
      <xsl:attribute name="id">mySlides</xsl:attribute>
      <xsl:apply-templates>
        <xsl:with-param name="templatenumber" select="0"/>
      </xsl:apply-templates>
    </div>

    <div>
      <xsl:attribute name="id">myController</xsl:attribute>
      <xsl:apply-templates>
        <xsl:with-param name="templatenumber" select="1"/>
      </xsl:apply-templates>
    </div>
  </xsl:template>

  <xsl:template match="DocumentElement/QueryResults">
    <xsl:param name="templatenumber">tobereplace</xsl:param>

    <xsl:if test="$templatenumber=0">
      <div>
        <xsl:attribute name="id">myController</xsl:attribute>
        <div>
          <xsl:attribute name="class">article</xsl:attribute>
          <h2>
            <a>
              <xsl:attribute name="class">title</xsl:attribute>
              <xsl:attribute name="title"><xsl:value-of select="Title"/></xsl:attribute>
              <xsl:attribute name="href">/stories/stories-details/articletype/articleview/articleid/<xsl:value-of select="ArticleId"/>/<xsl:value-of select="SEOTitle"/>.aspx</xsl:attribute>
              <xsl:value-of select="Title"/>
            </a>
          </h2>
          <div>
            <xsl:attribute name="style">text-indent: 25px;</xsl:attribute>
            <xsl:attribute name="class">articlesummary</xsl:attribute>
            <xsl:call-template name="removeHtmlTags">
              <xsl:with-param name="html" select="Summary" />
            </xsl:call-template>
          </div>
        </div>
      </div>
    </xsl:if>
    <xsl:if test="$templatenumber=1">
      <div>
        <xsl:attribute name="id">myController</xsl:attribute>
        <span>
          <xsl:attribute name="class">jFlowControl</xsl:attribute>
          aa
        </span>
      </div>
    </xsl:if>
  </xsl:template>

  <xsl:template name="removeHtmlTags">
    <xsl:param name="html"/>
    <xsl:choose>
      <xsl:when test="contains($html, '&lt;')">
        <xsl:value-of select="substring-before($html, '&lt;')"/>
        <!-- Recurse through HTML -->
        <xsl:call-template name="removeHtmlTags">
          <xsl:with-param name="html" select="substring-after($html, '&gt;')"/>
        </xsl:call-template>
      </xsl:when>
      <xsl:otherwise>
        <xsl:value-of select="$html"/>
      </xsl:otherwise>
    </xsl:choose>
  </xsl:template>
</xsl:stylesheet>
like image 927
Marcos Buarque Avatar asked Mar 14 '09 16:03

Marcos Buarque


People also ask

How do you escape special characters in XSLT?

Long answer: The value of attributes cannot contain a few special characters, such as '<' , '>' and '&' . If present, they are escaped as: '&lt;' , '&gt;' and '&amp;' . These characters can be produced if the output method is 'text', which is not your case.

How do you escape a single quote in XSLT?

You can use the built-in entities &apos; and &quot; In XSLT 1.0: Alternatively, you can define your $Q and $APOS variables (put the content (the literal " or the literal ' character) in the body of the xsl:variable , not in the select attribute).

How do I replace in XSLT?

XSLT replace is deterministic and does string manipulation that replaces a sequence of characters defined inside a string that matches an expression. In simple terms, it does string substitution in the specified place by replacing any substrings. Fn: replace function is not available in XSLT1.


1 Answers

Based in the assumption that you have this HTML string,

<p>My name is Freddy &amp; I was

then if you escape it and store it in a database it would become this:

&lt;p&gt;My name is Freddy &amp;amp; I was

Consequently, if you retrieve it as XML (without unescaping it beforehand), the result would be this:

&amp;lt;p&amp;gt;My name is Freddy &amp;amp;amp; I was

and <xsl:value-of select="." disable-output-escaping="yes" /> would produce:

&lt;p&gt;My name is Freddy &amp;amp; I was

You are getting exactly the same thing you have in your database, but of course you see the HTML tags in the output. So what you need is a mechanism that does the following string replacements:

  • "&amp;lt;" with "&lt;" (effectively changing &lt; to < in unescaped ouput)
  • "&amp;gt;" with "&gt;" (effectively changing &gt; to > in unescaped ouput)
  • "&amp;quot;" with "&quot;" (effectively changing &quot; to " in unescaped ouput)
  • "&amp;amp;" with "&amp;" (effectively changing &amp; to & in unescaped ouput)

From your XSL I have inferred the following test input XML:

<DocumentElement>
  <QueryResults>
    <Title>Article 1</Title>
    <ArticleId>1</ArticleId>
    <SEOTitle>Article_1</SEOTitle>
    <Summary>&amp;lt;p&amp;gt;Article 1 summary &amp;amp;amp; description.&amp;lt;/p&amp;gt;</Summary>
  </QueryResults>
  <QueryResults>
    <Title>Article 2</Title>
    <ArticleId>2</ArticleId>
    <SEOTitle>Article_2</SEOTitle>
    <Summary>&amp;lt;p&amp;gt;Article 2 summary &amp;amp;amp; description.&amp;lt;/p&amp;gt;</Summary>
  </QueryResults>
</DocumentElement>

I have changed the stylesheet you supplied and implemented such a replacement mechanism. If you apply the following XSLT 1.0 template to it:

<xsl:stylesheet
  version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:my="my:namespace"
  exclude-result-prefixes="my"
>

  <xsl:output method="html" omit-xml-declaration="yes"/>

  <my:unescape>
    <my:char literal="&lt;" escaped="&amp;lt;" />
    <my:char literal="&gt;" escaped="&amp;gt;" />
    <my:char literal="&quot;" escaped="&amp;quot;" />
    <my:char literal="&amp;" escaped="&amp;amp;" />
  </my:unescape>

  <xsl:template match="DocumentElement">
    <div id="mySlides">
      <xsl:apply-templates mode="slides" />
    </div>
    <div id="myController">
      <xsl:apply-templates mode="controller" />
    </div>
  </xsl:template>

  <xsl:template match="DocumentElement/QueryResults" mode="slides">
    <div class="article">
      <h2>
        <a class="title" title="{Title}" href="{concat('/stories/stories-details/articletype/articleview/articleid/', ArticleId, '/', SEOTitle, '.aspx')}">
          <xsl:value-of select="Title"/>
        </a>
      </h2>
      <div class="articlesummary" style="text-indent: 25px;">
        <xsl:apply-templates select="document('')/*/my:unescape/my:char[1]">
          <xsl:with-param name="html" select="Summary" />
        </xsl:apply-templates>
      </div>
    </div>
  </xsl:template>

  <xsl:template match="DocumentElement/QueryResults" mode="controller">
    <span class="jFlowControl">
      <xsl:text>aa </xsl:text>
      <xsl:value-of select="Title" />
    </span>
  </xsl:template>

  <xsl:template match="my:char">
    <xsl:param name="html" />
    <xsl:variable name="intermediate">
      <xsl:choose>
        <xsl:when test="following-sibling::my:char">
          <xsl:apply-templates select="following-sibling::my:char[1]">
            <xsl:with-param name="html" select="$html" />
          </xsl:apply-templates>
        </xsl:when>
        <xsl:otherwise>
          <xsl:value-of select="$html" disable-output-escaping="yes" />
        </xsl:otherwise>
      </xsl:choose>
    </xsl:variable>
    <xsl:call-template name="unescape">
      <xsl:with-param name="html" select="$intermediate" />
    </xsl:call-template>
  </xsl:template>

  <xsl:template name="unescape">
    <xsl:param name="html" />
    <xsl:choose>
      <xsl:when test="contains($html, @escaped)">
        <xsl:value-of select="substring-before($html, @escaped)" disable-output-escaping="yes"/>
        <xsl:value-of select="@literal" disable-output-escaping="yes" />
        <xsl:call-template name="unescape">
          <xsl:with-param name="html" select="substring-after($html, @escaped)"/>
        </xsl:call-template>
      </xsl:when>
      <xsl:otherwise>
        <xsl:value-of select="$html" disable-output-escaping="yes"/>
      </xsl:otherwise>
    </xsl:choose>
  </xsl:template>

</xsl:stylesheet>

Then this output HTML is produced:

<div id="mySlides">
  <div class="article">
    <h2>
      <a class="title" title="Article 1" href="/stories/stories-details/articletype/articleview/articleid/1/Article_1.aspx">Article 1</a>
    </h2>
    <div class="articlesummary" style="text-indent: 25px;">
      <p>Article 1 summary &amp; description.</p>
    </div>
  </div>
  <div class="article">
    <h2>
      <a class="title" title="Article 2" href="/stories/stories-details/articletype/articleview/articleid/2/Article_2.aspx">Article 2</a>
    </h2>
    <div class="articlesummary" style="text-indent: 25px;">
      <p>Article 2 summary &amp; description.</p>
    </div>
  </div>
</div>
<div id="myController">
  <span class="jFlowControl">aa Article 1</span>
  <span class="jFlowControl">aa Article 2</span>
</div>

Note

  • the use of a temporary namespace and embedded elements (<my:unescape>) to create a list of characters to replace
  • the use of recursion to emulate an iterative replacement of all affected characters in the input
  • the use of the implicit context within the unescape template to transport the information which character is to be replaced at the moment

Furthermore note:

  • the use of template modes to get different output for the same input (this replaces your templatenumber parameter)
  • most of the time there is no need for <xsl:attribute> elements. They can safely be replaced by inline notation (attributename="{attributevalue}")
  • the use of the concat() function to create the URL

Generally speaking, it is a bad idea to store escaped HTML in a database (more generally speaking: It is a bad idea to store HTML in a database.). You set yourself up to get all kinds of problems, this being one of them. If you can't change this setup, I hope that the solution helps you.

I cannot guarantee that it does the right thing in all situations, and it may open up security holes (think XSS), but dealing with this was not part of the question. In any case, consider yourself warned.

I need a break now. ;-)

like image 148
Tomalak Avatar answered Oct 23 '22 03:10

Tomalak